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SYSTEM AND METHOD FOR THE STORAGE. INDEXING AND RETRIEVAL 
OF XM L DO CUMENTS USING RELATIONAL DATABASES 

This application claims priority under 35 USC §§ 1 19 and 120 from US 
5 Provisional Patent Application No. 60/169,101 filed December 6, 1999. 

gfKfkgTPtind of the Invention 

This invention relates generally to a system and method for storing documents 
in one format in a database having a different format and in particular to a system and 
method for storing and retrieving extensible Markup Language (XML) documents 
10 using a relational database. 

Tbe new extensible Markup Language (XML) protocol is poised to become the 
lingua franca of the Internet for capturing and electronically transmitting information. 
The advantage of XML, as compared to the older hypertext markup language protocol 
(HTML), is that it contains tags which render semantic significance to the information 
15 between the tags (e.g., the text between the tags is the last name of an author). In 
contrast, HTML tags are used primarily for specifying how the information is to be 
displayed in a browser (e.g., show the text between the tags in bold Arial font). 
Additionally, using known extensible Stylesheets (written in XSL), one may specify 
not only the format of how different XML elements are to be shown in a browser, but 
20 also tbe order in which they are to be displayed. These features of XML give a user 
much greater power and flexibility in searching for relevant information since a search 
may be performed using the tags that contain the semantic information. In addition, 
XML permits examining the information from different perspectives once it is found 
by the user. 
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To take fall advantage of the possibilities that the XML protocol affords, it is 
desirable to devise an efficient means of storing, indexing and retrieving (via queries) 
XML documents. Typical RDMS, ODMS and flat files are slow and inefficient at 
5 storing XML documents. A preferred way of building Document Object Model 

(DOM) representations of the XML documents and then traversing the resulting trees 
to locate relevant nodes is only acceptable for small documents since memory becomes 
a limiting factor when the XML documents approach even moderate sizes. In addition, 
searches are not optimal since all searches must begin at the root of the document 
10 . inslead of at any node in the document. Moreover, it is not possible to search across a 
collection of documents (e.g. poems, novels, short stories and plays) for a particular 
character or the author. 

At the same rime, XML documents present unique challenges to storage in 
relational databases since their semi-structured nature often leads to a proliferation of 
15 tables when normalization is carried out. Given that relational database technology has 
seen great strides over the past couple of decades, it would be desirable and useful to 
provide a clean way of representing XML documents in relational terms. It is therefore 
the goal of the present invention to provide a system and method for the storage, 
. indexing and retrieval of XML documents using relational databases. 

20 Summary of tiieJnvention 

A system and method for storing, indexing and retrieving XML documents in a 
relational database is provided in accordance with the invention. The method may 
include identifying and assigning properties and encodings to the nodes of an XML 
document that will make them amenable to storage and retrieval using relational 
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databases. The method has several advantages. It allows the system to capture and 
reproduce the structure of not only the whole document, but fragments of each 
document as well. It also permits a user to traverse the XML tree, figuratively, by 
means of string manipulation queries instead of following pointers in memory or 
5 computing joins between tables, which are computationally more expensive operations. 
Finally, the properties and encodings that are attached to the nodes are compact and 
can be effectively indexed, thus enhancing the performance of queries against the 
database. 

The system in accordance with the invention uses any relational database 
1 0 management system to store the XML documents so that the system and method are 
not dependent on any particular relational database implementation. The system 
permits a user to search through the XML documents stored in the relational database 
from any node element without starting .from the root element of the document. This 
• provides optimal efficiency during search and retrieval that can not be obtained using 
15 other methods today. In addition, a document may be constructed from any node and 
its descendants. The system also permits documents conforming to any XML schema 
to be stored in an efficient manner. The system can also store any well formed XML 
document that do not conform to any schema or DTD (Document Type Definition). 
This is an important feature as a large majority of XML documents generated do not 
20 conform to a schema or DTD. 

In accordance with the invention, the system may include a converter and a searcher 
thai permit XML documents to be stored in the relational database and retrieved from a 
relational database using typical SQL queries. In a preferred embodiment, the 
converter and searcher may be one or more software modules being executed by a 
25 central processing unit on a computer system. In accordance with the invention, the 
method for storing the XML documents may include the steps of generating an 
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XMLName value for each element in the document tree, generating a NamePath value 
for each node of the document and generating an OrderPath value for each node of the 
document. Collectively, assigning values to these elements-arc called encodings. 
These encodings result in efficient storage, indexing and searching of XML documents 
5 -without destroying the underlying hierarchical structure of the documents. The 

retrieval of the XML documents once they are in the relational database is relatively 
easy since typical string matching SQL queries may be used. 

Thus, in accordance with the invention, a computer system and method for 
manipulating an XML document using a relational database is provided. The system 
10 comprises a convener that receives an XML document and generates a set relational 
database tables based on the hierarchical structure of XML a database for storing the 
relational database tables, and a searcher for querying the generated relational database 
table in the database to locate content originally in the XML document that is now 
stored in the relational database tables wherein the located content is returned to the 
1 5 user as an XML document or a portion of an XML document as desired by the user 
which can be another software module. The invention also includes the searcher that 
can convert queries specified on the XML document or document collections and 
convert them to simple SQL queries to retrieve the content desired by the user. 

In accordance with another aspect of the invention, a computer system for 
20 storing an XML document using a relational database is provided wherein the system 
comprises a converter that receives an XML document and generates relational 
database tables based on the structure of the XML document. The converter further 
comprises a software module that generates a unique name attribute for each node in 
the XML document, a software module that generates a path attribute for a particular 
25 node of the XML document wherein the path attribute comprises a list of the name 

attributes for the one or more nodes from the particular node to a root node of the XML 
document, a software module that generates an order attribute for the particular node, 
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the order attribute comprising an enumerated order of the particular node from the root 
node to the particular node, and a software module that generates a NodeValue 
attribute containing a' value of the particular node*. Collectively these attributes are 
called encodings that result in efficient storage, indexing and searching of XML 
5 documents without destroying the underlying hierarchical structure of the documents. 

In accordance with yet another aspect of the invention, a data structure that 
stores a node of interest of an XML document in a relational database is provided. The 
data structure comprises an XMLName attribute comprising a unique name for the 
node of interest, a NamePath attribute comprising a list of the XMLName attributes for 
10 the one or more nodes from the node of interest to a root node of the XML document, 
an OrderPath attribute comprising an enumerated order of the node of interest from the 
root node to the node of interest, and a NodeValue attribute containing a value of the 
node of interest. Collectively these attributes are called encodings that result in 
efficient storage, indexing and searching of XML documents without destroying the 
15 underlying hierarchical structure of the documents. 

Brief Description of the Drawings 

Figure J is a diagram illustrating a personal computer implementation of an 
XML document storage and retrieval system in accordance with the invention; 

Figure 2 is a diagram illustrating more details of the XML document storage 
20 and retrieval system in accordance with the invention; 

Figure 3 is a diagram illustrating an example of a document type definition 
(DTD) tree for an XML document; 

Figure 4 is a diagram illustrating an XML document corresponding to the table 
shown in Figure 3; 
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Figure 5 is a flowchart illustrating an example of a method for storing XML 
documents in a relational database in accordance with the invention; and 

Figure 6 is a flowchart illustrating a method for retrieving an XML document 
from a search of a relational database in accordance with the invention. 

5 Detailed DescriptiQCL_of a Preferred Embodiment 

The invention is particularly applicable to a software implemented XML 
document storage and retrieval system and method and it is in this context that the 
invention will be described. It will be appreciated, however, that the system and 
method in accordance with the invention has greater utility since it may be 
10 implemented in hardware instead of software. 

Figure 3 is a block diagram illustrating an embodiment of a software-based 
XML document storage and retrieval system 20 in accordance with the invention- In 
this embodiment, the storage and retrieval system 20 may be executed by a computer 
22. The computer 22 may be a typical stand-alone personal computer, a computer 
15 connected to a network, a client computer connected to a server or any other suitable 
computer system. For purposes of illustration only, an embodiment using a stand- 
alone computer 22 wilt be described herein. 

The computer 22 may include a central processing unit (CPU) 28, a memory 
30, a persistent storage device 32, such as a bard disk drive, a tape drive, an optical 
20 drive or the like and a storage and retrieval system 34. In a preferred embodiment, the 
Storage and retrieval system may be one or more software applications stored in the 
persistent storage device 32 of the computer that may be loaded into the memory 30 so 
that the storage and/or retrieval functionality of the storage and retrieval system may be 
executed by the CPU 28. The computer 22 may be connected to a remote server or 
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.other computer networks that permit the computer 22 to network with and share the 
stored XML document with other computers or to perform searches on XML stored 
documents on other computer systems. 

The computer 22 may further include one or more input devices 36, such as a 
5 keyboard 38, a mouse 40, a joystick or the like, a display 42 such as a typical cathode 
ray tube, a flat panel display or the like and one or more output devices (not shown) 
such as a printer for producing printed output of the search results. The input and 
output devices permit a user of the computer to interact with the storage and retrieval 
system so that the user may, for example, enter a query using the input devices and 
10 view the results of the query on the display or print the query results. 

As described below in more detail, the storage and retrieval system 34 may 
include one or more different software modules that provide XML document storage 
capabilities and XML document retrieval capabilities in accordance with the invention. 
Now, more details of the storage and retrieval system will be described. 

15 Figure 2 is a diagram illustrating more details of the XML document storage 

and retrieval system 34 in accordance with the invention. The system may include a 
converter module 50, a searcher module 52 and a relational database 54. Each of the 
modules may be implemented, in a preferred embodiment, as a software application 
being executed by a CPU as described above. The relational database 54 may be any 
20 type of relational database so that the system 34 in accordance with the invention may 
be used to store XML documents in any relational database system. 

The converter module 50 accepts XML documents, processes them and outputs 
relational dat3 about the XML documents as described below that is stored in the 
typical relational database 54. The searcher module 52 generates a user interface to a 
25 user, permits the user to enter a text string type relational database query, processes the 
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query by communicating a query to the relational database 54 and sends the results of 
the query in its original XML form to the user so that the user may view or print the 
query results; In combination, the two modules shown permit XML documents to be 
stored in any relational database system and .then permits a user to enter a typical text 
5 string relational database query in order to retrieve XML documents from the relational 
database that match the text string query. Each of these modules will be described in 
more detail below. Now, an example of a Document Type Definition (DTD) of an 
XML document will be described to better understand the invention. This example of 
the DTD will be used as an example to illustrate the storage and retrieval system in 
10 accordance with the invention. 

Figure 3 is a diagram illustrating an example of a Document Type Definition 
(DTD) tree 60 for an XML document. Although not required to do so, an XML 
document typically conforms to a DTD which, loosely speaking, is a schema for the 
data found in the document. However, XML documents are semi -structured in the 
15 sense that there are elements specified in the DTD that may be optionally present and 
some that may be present more than once. This is in contrast to typical relational 
database tables where each record must have either zero (if it is NULL) or only one 
value for an attribute. 

— ■ XML documents also resemble an object-oriented database in that there are 

20 parent-child relationships between elements which are not found between attributes in 
a relational database. The following example of an XML document should help make 
these distinctions more clear. An example of the XML DTD syntax may be: 

< J ELEMENT library (book*, periodical*^ 
25 <!ELEMENT book (title, author+)> 

<! ATTUST book edition CD ATA #REQUIRED> 
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<! ELEMENT author (title?, firstname, lastname)> 

In the above example, elements that appear within parentheses are the children 
of elements before the parentheses. In addition a denotes 0 or more occurrences of 
5 the element, a u +" denotes one or more occurrences and a '*?" denotes 0 or 1 

occurrence. The above example DTD may be represented by the DTD tree shown in 
Figure 3. The DTD tree 60 may include a root node 62 (containing the element 
"library" in this example), one or more intermediate nodes 64 and one or more leaf 
nodes 66 that do not have any further nodes attached to them. An example of an XML 
10 document 70 that conforms to the DTD is shown in Figure 4. It contains the instances 
of elements in the DTD tree along with data for each element. The conversion of this 
example of an XML document into a format that may be stored in a relational database 
in accordance with the invention will now be described. 

Figure 5 is a flowchart illustrating an example of a method 80 for storing XML 
15 documents in a relational database in accordance with the invention. The method 

involves computing three properties, each of which is described below, for each XML 
document node so that the XML document may be stored, in an efficient manner, in a 
relational database. The encoding scheme set forth below is a preferred encoding 
embodiment. However, other encoding schemes may also be used. For example, the 

20 encoding set forth below (e.g., 1/2/5/6) may be represented as 1 raised to the power 1, 
2 raised to the power 2, 3 raised to the power 5 and 4 raised to the power 6 and so on. 
That way, instead of performing string manipulation, the system would be doing 
factorization. Based on this other encoding, the factorization approach can generate 
faster queries and save indexing and database space. Thus, the invention is not limited 

25 to any particular encoding and the encodings in accordance with the invention are 
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created based on the structure of the document and then the encodings are used to 
store, index and search for the content while preserving the hierarchy of the document. 
In a first step 81 of the method, it is determined if an element is ready for processing. 
If there is an element ready for processing, then the method generates an XMLName 
5 property for the particular element. If an element is not ready for processing, but an 
attribute of the XML document is read for processing, then the method also generates 
the XMLName property for the particular attribute. In more detail, the method starts 
by assigning each element name a unique XMLName property (in this example, the 
property is alphanumeric). For the example above, we could assign the XMLNames as 
10 shown in Table 1 (the XMLName Table). 

Table 1 (the "XMLName Table") 



Element or Attribute Name 


XMLName 


library 


1 


book 


2 


periodical 


3 


edition 


4 


title 


5 


author 


6 


ilrstname 


7 


lastname 


8 



Note that "title" gets only one XMLName value even though the element 
appears twice in the DTD tree as either the title of a book or the title of an author. This 
15 allows for more XMLName attributes to be encoded given strings of a specific length. 
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Now, in step 84, a NamePath value is automatically determined for each node 
of the DTD tree, in particular, the NamePath value may be constructed from the 
XMLNames of each node on the path from the root node to the node of interest. From 
this analysis., we obtain the following table of NamePath values for the example XML 
5 document: 
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Namcfatfa ims 



DTD Node 


NamePath 


library 


1 


library/book 


1/2 


library/periodical 


1/3 


library/book/edition 


1/2/4 


library/book/title 


1/2/5 


library /book/author 


1/2/6 


libraiy/boak/author/title 


1/2/6/5 


library /book/auuhor/firstname 


1/2/6/7 


library/book/author/] asmame 


1/2/6/8 



As shown in the tabic, each DTD node, such as 'Uibrary/book/author/Iastname 
5 has a corresponding NamePath value, such as "1/2/6/8". In this manner, using the 
NamePath values,, it is possible to navigate through the XML document using the 
relational database. In other words, using this table, the path to any node in the DTD 
tree (and hence the XML document) may be easily determined. This table may also be 
stored in the relational database. 

10 Next, in step 86, the method may automatically generate an OrderPath value for 

each node in the XML document. In particular, each number in the slash-separated 
OrderPath (see the table below) denotes the breadth-wise enumerated order of the node 
on the path from the root to the node of interest Each document node may also inherit 
the NamePath of the DTD node of which it is an instance. A full DocNode Table for 

15 the example XML document looks like this: 
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BasE&te Table 



NodeNamc 


NamcFath 


OrderPath 




library 


1 


1 




book 


1/2 


1/1 




edition 


1/2/4 


1/1/1 


first 


title 


J/2/5 


i/i /? 


i ne AMI Revolution 


author 


1/2/6 


W -1/ J 




title 


1/2/6/5 


1/1/3/1 


Software Engineer 


flrstnarnc 


1/2/6/7 


1/1/3/2 


David 


* iastn-arne 


1/2/6/8 


1/1/3/3 


"W r\ ) 1 r » r-i Urt ^1/ 

noxiLii DCCK 


suthor 


1/2/6 


1/1/4 




title 


1/2/6/5 


1/1/4/1 


v,aici /vrcnircci 


fkstname 


1/2/6/7 


1/1/4/2 


Carol 


Jastname 


1/2/6/S 


1/1/4/3 


Bohr 


book 


1/2 


1/2 




.edition 


1/2/4 


1/2/1 


second 


title 


1/2/5 


1/2/2 


Java Classes for XML 


author 


1/2/6 


1/2/3 




flrstname 


1/2/677 


3/2/3/1 


Carol 


lastnarnc 


1/2/6/8 


1/2/3/2 


Hollenbeck 


author 


1/2/6 


1/2/4 




title 


1/2/6/5 


1/2/4/1 


XML Guru 


flrstname 


1/2/6/7 


1/2/4/2 


David 


lastname 


1/2/6/8 


1/2/4/3 


Bohr 



As shown in the Table that may be stored in a relational database, each 
document node may include a NodeNamc value (the name of the element), a 
5 NamePath value (See above), an OrderPath Value (automatically generated during this 
step), and a NodeVaJue value (containing the actual data in that particular node). 
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In step 88, the method determines if there arc any more nodes to process and 
loops back to step 81 if there axe more nodes. If all of the nodes have been processed, 
then the DocNode Table may be saved in the relational database. In this manner, an 
XML document is automatically processed in order to generate a DocNode Table that 
5 may be -stored in any relational database. Once the DocNode table is generated by the 
system, it may be searched as will now be described in more detail. 

Figure 6 is a flowchart illustrating a method 1 00 for retrieving an XML 
document from a search of a relational database in accordance with the invention. In 
step 102, the user or the system using user input, may generate a relational database 
10 query. In step 104, the system may query the relational database and in step 100, the 
query results are output to the user. In accordance with the invention, the system may 
convert the query results back into references to portions of the XML document so that 
the user may review the portions of the XML document retrieved during the search in 
step 108. Now, several examples of retrieving XML documents based on a relational 
15 database search will be provided. In particular, a few examples will be shown of how 
the system may use the NamePatb and OrderPath values to select nodes with desired 
attributes from the XML document repository and also may construct fragments of the 
original XML documents containing these selected nodes. In all the sample queries 
below, we assume that we know the context (i.e., the position within the DTD tree) of 
20 the nodes we are interested in. 

In a first example, a user wants to query the XML document repository to 
return the titles of all books who have an author with the title of "Chief Architect' 1 . 
Since we know the context of title (i,e., library/book/author/title), we can consult the 
XMLName Table to obtain the relevant XMLNarnes and construct the NainePath of 
25 title which is " 1/2/6/5" in this example. Then, the system may issue the first query that 
is: 
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"Select OrderPath from DocNodeTable where NamePath « * 1/2/6/5' and 
NodeValue - 'Chief Architect 1 " 

f 

This query returns an OrderPath of "1/1/4/1 " as the result. Since we also know 
that the element "book" is a grand-parent of element "title", we can deduce that its 
5 OrderPath is 1/1 . Finally we construct the NamePath of the element "book title" as 
"1/2/5" and execute the second query that is : 

"Select NodeVahie from DocNodeTable where NamePath = 4 1/2/5* and 
OrderPath like ' l/i/%"\ 

This second query returns the value "The XML Revolution" as the. result. This 
1 0 result accomplishes the user goal of returning all books whose author's title is "Chief 
Architect". In this manner* the XML document repository is queried using typical 
relatiorial database queries. 

In this second example, the user wants to search for the titles of all books who 
have an author by the name of Carol Hollenbeck. To accomplish this, the system may 
1 5 generate a 6rst query to select the OrderPaths of all flrstname nodes with the value 
Carol: 

"Select OrderPath from DocNodeTable where NamePath = 4 1/2/6/7' and 
NodeValue = 'Carol'". 

This query returns "1/1/4/2" and "1/2/3/1 " as the result set. Next, a second 
20 query is generated to select the OrderPaths of all lastname nodes with the value 
Hollenbeck: 

"Select OrderPath from DocNodeTable where NamePath = '1/2/6/8' and 
Node Value = 'Hollenbeck'" 

-15- 



BNS0OCID: «WO 0142B81A2 I > 



BNS oaae 1 



This query returns "1/1/3/3" and "1/2/3/2" as the result set. Since wc know 
firstname and lastname nodes of the same person belong to the same parent author 
node, wc can deduce from the result sets that only the nodes with OrderPatbs "1/2/3/1 " 
and "1/2/3/2° arc of interest to us. Thus, we want the title of the book with OrderPath 
1/2, which we can retrieve with the following query: 

"Select Node Value from DocNodeTable where NamePath = '1/2/5' and 
OrderPath like* 1/2/%'" 

This query returns "Java Classes for XML" as the result which is the proper 

result. 

In a third example, the user wants to be returned all the information pertaining 
to the authors of 'The XML Revolution" and presented in the original document order. 
Thus, first, the OrderPath of the relevant title node is determined by the following 
query; 

"Select OrderPath from DocNodeTable where NamePath = 4 1/2/5" and 
Node Value = 'The XML Revolution 1 " 

This query returns " 1/1/2" as the result. Thus, as a result of the first query, we 
know that the OrderPath of the relevant book node is n 1/1 *\ Since the nodes for all 
author information are descendants of the author node (that has NamePath "1/2/6"), 
which in turn is a child of the "book" node, we -can execute the following query to 
obtain the required result: 

"Select Node Value from DocNodeTable where NamePath like '1/2/6/%' and 
OrderPath like € l/l/%' Order by OrderPath" 
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This query returns "Software Engineer, David, Hollcnbeck, Chief Architect, 
Carol, Bohr" in the original document order as the result set. 

Now, several enhancements to the system and method described above will be 
provided. In accordance with another aspect of the invention, the XMLName Table 
5 may be cached in memory. In particular, to facilitate construction of the NamePath 
values, we can store the contents of XMLNarae Table in a hash table which we keep 
resident in memory. This prevents the execution of multiple queries against the 
database to obtain all the necessary XMLName values. In accordance with yet another 
aspect of the invention, the XMLName values may be divided into NameSpaces. In 
10 particular, as the number of XMLName values increases, it may become necessary to 
divide the values into various namespaces to keep the lengths of the names short. 
XMLName values from namespaces relevant for working with a particular document 
can then be brought into the cache when necessary without having to bring the entire 
XMLNameTable into memory. 

15 In accordance with yet another aspect of the invention, the system may use 

base-64 encoding. In particular, to reduce the amount of storage required for the 
XMLName, NamePath, and OrderPath tables in the relational database, we could 
consider using a Base-64 encoding scheme instead of alphanumeric strings. In 
• accordance with the invention, it is also possible to add a DigitPath attribute as an 
20 adjunct attribute to OrderPath so that the system can ensure proper sorting of nodes 
while obviating the need for place-holding characters as the number of characters 
increases. For example, to sort the paths f 'l/10/2" and "1/2/3" properly, the system 
would have needed to encode the second as "1/-2/3". However, if we added "1/2/1 M 
and "1/1/r 1 as DigitPaths and ordered the results by these before OrderPaths, then we 
25 would be able to do without the place-holding dashes. 
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In accordance witli the invention, a RcverseNamePath attribute may be 
automatically generated to further improve the speed of queries. In particular, since it 
is possible to have an XML document that is an instance of a DTD sub-tree, we may 
need to evaluate an expression such as: 

5 "Select NodeValue from DocNodc Tabic where NamePath like *%/ 1/2/3 ' " 

Since indexes built on NarncPath generally do not help in the execution of such 
queries, we can improve performance by having a RcverseNamePath attribute 
constructed by reversing the order of the XMLNamcs in the path expression. Thus, in 
accordance with the invention* the above query would now read: 

10 "Select NodeValue from DocNodeTable where ReverseNamePath like 

4 3/2/l/°/o ,M 

In accordance with the invention, the system may include a transformation 
engine that converts XPata expressions into equivalent SQL statements involving 
NamePath and OrderPath attributes so that the converted queries would then be 
15 executed against the repository. 

In summary, a system and method for assigning attributes to XML document 
nodes to facilitate their storage and indexing in relational databases and the subsequent 
retrieval and re-construction of pertinent nodes and fragments in original document 
order is provided. Since these queries are performed using relational database query 
20 engines, the speed of their execution is significantly faster than that using more exotic 
systems such as object-oriented databases. Furthermore, this method is portable across 
all vendor platforms, and so can be deployed at client sites without additional 
investments in database software. 
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In accordance with the invention, the hierarchical relationships of XML 
documents are encoded so that the XML documents may be mapped to a set of 
relational tables. Once the mapping and encoding is completed, then searching and 
querying of the XML documents may be done by mapping any XML query language 
5 (which is -well known) to SQL (also well known) automatically. 

While the foregoing has been with reference to a particular embodiment of the 
invention, it will be appreciated by those skilled in the art that changes in this 
embodiment may be made without departing from the principles and spirit of the 
invention as set forth in the appended claims. 
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CLAIMS: 

1 LA computer system for manipulating an XML document using a 

2 relational database, comprising: 

3 a converter that receives an XML document and generates a predetermined set 

4 of relational database tables based on the XML document; 

5 a database for storing the relational database table; and 

6 a searcher for querying the generated relational database table in the database ro 

7 locate content originally in the XML document that is now stored in the relational 
S database table wherein the located content is returned to the user as a portion of an 
9 XML document. 

1 2. The system of Claim 1, wherein the converter further comprises a 

2 software module rhat generates a unique name attribute for each node in the XML 

3 document. 

1 3- The system of Claim 2, wherein the converter further comprises a 

2 software module that generates a path attribute for a particular node of the XML 

3 document wherein the path attribute comprises a list of the name attributes for the one 

4 or more nodes from the particular node to a root node of the XML document. 

1 4. The system of Claim 3, wherein the convener further comprises a 

2 software module that generates an order attribute for the particular node, the order 

3 attribute comprising an enumerated order of the particular node from the root node to 

4 the particular node. 
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1 5. The system of Claim 4, wherein the converter further comprises a 

2 software module that generates a NodeVahie attribute containing a value of the 

3 particular node, 

1 6. The system of Claim 5, wherein the searcher further comprises a query 

2 generator that generates a query into the database to find a piece of information in the 

3 database 

4 corresponding to information in a node of the XML document and a converter 

5 that converts the results of the query into portions of an XML document that are 

6 displayed to the user. 

1 7. The system of Claim 2, wherein the name attribute for each node in the 

2 XML document is stored in a hash table so that the name attributes are retrieved from 

3 the hash table instead of the database. 

1 g. Tbe system of Claim 2, wherein the name attributes of the nodes of the 

2 XML document are divided into one or more categories so that related name attributes 

3 are grouped together. 

1 9. The system of Claim 1, wherein the name attributes are encoded using 

2 base-64 encoding. 

1 10. The system of Claim 3, wherein the converter further comprises a ( 

2 software module that generates a reverse path comprising the list of name attributes 

3 from the path attribute in reverse order. 
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1 11. The system of Claim 1, wherein the converter further comprises a 

2 transform engine that converts Xpath expressions in the XML document into SQL 

3 queries. 



1 12. A computer system for storing an XML document using a relational 

2 database, comprising: 

3 a converter that receives an XML document and generates a relational database 
A table based on the XML document; 

5 the converter further comprising a software module that generates a unique 

6 name attribute for each node in the XML document, a software module that generates a 

7 path attribute for a particular node of the XML document wherein the path attribute 

8 comprises a list of the name attributes for the one or more nodes from the particular 

9 node to a root node of the XML document, a software module that generates an order 

10 attribute for the particular node, the order attribute comprising an enumerated order of 

11 the particular node from the root node to the particular node, and a software module 

12 that generates a NodeValue attribute containing a value of the particular node. 

1 13 . A method for manipulating an XML document using a relational 

2 database, comprising: 

3. generating a relational database table based on an XML document wherein the 

4 information about each node of the XML document is stored in a row of the table; 

5 storing the relational database table in a database; and 

6 querying the generated relational database table in the database to locate 

7 content originally in the XML document that is now stored in the relational database 

8 table wherein xbe located content is returned to the user as a portion of an XML 

9 document. 
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1 14. The method of Claim 13, wherein generating the table further comprises 

2 generating a unique name attribute for each node in the XML document. 

1 15. The method of Claim 14, wherein generating the table further comprises 

2 generating a path attribute for a particular node of the XML document wherein the path 

3 attribute comprises a list of the name attributes for the one or more nodes from the 

4 particular node co a root node of the XML document. 

1 16. The method of Claim 15, wherein generating the table further comprises 

2 generating an order attribute forthe particular node, the order attribute comprising an 

3 enumerated order of the particular node from the root node to the particular node. 

1 1 7. The method of Claim 16, wherein generating the table further comprises 

2 generating a NodeValuc attribute containing a value of the particular node. 

1 18. The method of Claim 1 7 7 wherein querying the database further 

2 comprises generating a query into the database to find a piece of information in the 

3 database corresponding to information in a node of the XML document and converting 

4 the results of the query into portions of an XML document that are displayed to the 

5 user. 

1 19. The method of Claim 14 further comprising reuievdAg the name 

2 attribute for each node in the XML document from a hash tabic so that the name 

3 attributes are retrieved from the hash table instead of the database. 
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1 20. The method of Claim 14, wherein the name attributes of the nodes of 

2 the XML document are divided into one or more categories so that related name 

3 attributes are grouped together. 

1 21. The method of Claim 13, wherein the name attributes are encoded using 

2 base-64 encoding. 

1 22. The method of Claim 15, wherein generating the table further comprises 

2 generating a reverse path comprising the list of name attributes from the path attribute 

3 in reverse order. 

1 23. The method of Claim 13, wherein generating the table further comprises 

2 converting Xpath expressions in the XML document into SQL queries. 

1 24. A data structure that stores a node of interest of an XML document in a 

2 relational database, the data structure comprising: 

3 an XMLName attribute comprising a unique name for the node of interest; 

4 a NamePath attribute comprising a Jist of the XMLName attributes for the one 

5 or more nodes from the node of interest to a root node of the XML document; 

6 . • an OrderPath attribute comprising an enumerated order of the node of interest 

7 from the root node to the node of interest; and 

8 a NodeValue attribute containing a value of the node of interest. 

1 25. The data structure of Claim 24, wherein the data structure comprises a 

2 table in a relational database and each attribute comprises a column in the table in the 

3 relational database. 
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NESTED RELATIONAL DATA MODEL 

rnPYPTfiHT NOTICE 
A portion of the disclosure of this patent document contains material that 
is subject to copyright protection. The copyright owner has no objection to the facsimile 
5 reproduction by anyone of the patent document or the patent disclosure as it appears in 
the Patent and Trademark Office patent file or records, but otherwise reserves all 
copyright rights whatsoever. 

FTFTT3 OF THE INVENTION 
The present invention relates to information management in general and 
10 more particularly to methods for using Nested Relational Data Models (NRDMs) to 
manage information. 

BACKGROUND OF THE INVENTION 
Information is commonly managed in units of documents. For example, 
sales, distribution and manufacturing information might be contained within documents 
1 5 such as sales invoices or orders. Increasingly, documents pass between parties in 

electronic form, in a process generally referred to as EDI (Electronic Data Interchange). 
In electronic form, the documents are not limited to the text and images shown on the 
printed page, but can include formatting and "metadata" (data about the data). One 
example of a format for an electronic document that contains metadata is the Extended 

20 Markup Language (XML). 

Several products on the market allow mapping of XML documents to SQL 
tables or vice versa and several products on the market allow mapping of EDI documents 
to relational tables or vice versa, but these products typically require procedural 
specifications of how to perform the conversion, such as programming code. Traditional 

25 Relational Database Management Systems (RDMS's) such as described by Date or 
Ullman or implemented by Oracle, IBM, Microsoft and others as well as distributed 
databases as described in Ceri or U.S. Patent Nos. 5,884,3 10 and 5,596,744, implement 
declarative transformations of relational data. 

A class of systems called intelligent gateways (such as Sybase's 

30 OmniServer system) allow declarative rules to be transparently applied to heterogeneous 
relational databases. Another class of systems called Replication Servers (such as 
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described by U.S. Patent No. 5,1 37,601 or implemented as Sybase's Replication Server, 
Oracle's Replication Server, or the like) can provide homogeneous or heterogeneous data 
replication. 

Additional class of systems called the ETL (Extraction, Transformation, 
5 Loading) systems such as Microsoft DTS, Informatica PowerMart and D2K Tapestry 
provide extraction, transformation and loading of heterogeneous data between relational 
database systems. Some of these products support converting hierarchical files into a 
relational form by "flattening" the hierarchical files, making multiple passes through a 
hierarchical file and, at each pass, pulling out different parts of the hierarchy. 

10 Yet another class of systems that address mapping of relational data to a 

programming object, as exemplified by U.S. Patent Nos. 6,175,837, 6,163,781, 
6,134,559, 5,907,846, 5,873,093, 5,832,498, or products from Persistence, Bea and 
others. This class of tools maps persistently stored relational data to an object-oriented 
memory representation as well as mapping the data from an object-oriented memory 

1 5 representation to a set of persistent relational tables. 

Another class of prior art exists that provides object-oriented access to 
non-relational databases, as described in U.S. Patent Nos. 5,799,313, 5,778,379, and 
5,542,078. This class of systems addresses the mapping of data from hierarchical 
databases such as IMS, object oriented databases and relational databases to. an 

20 object-oriented programming object or database. 

Considerable research has been done on Nested Relational Data Models as 

described in , "Lecture Notes in Computer Science Volume 595: M. Levene - The 

Nested Universal Relation Database Model" and , "Lecture Notes in Computer 

Science Volume 361: S. Abiteboul et al. - Nested Relations and Complex Objects in 

25 Databases". That research focused mainly on defining the data model and specific 
operations on it. 

It is known to graphically map disparate schemas to each other. See, for 
example, U.S. Patent Nos. 5,850,631 and 5,806,066. It is also known to map data 
between different structures. See for example, U.S. Patent Nos. 5,627,972 and 5,119,465. 

30 SUMMARY OF THE INVENTION 

In one embodiment of data processing system according to the present 
invention, hierarchical documents or hierarchical messages are mapped to a Nested 
Relational Data Model to allow for transformation and manipulation using declarative 
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statements. The resulting nested data can be converted to a relational format and mapped 
to multiple relational tables, and/or converted from a nested relational format to an 
external hierarchical format, such as XML. 

The system can specify and execute declarative rules to extract, transform, 
integrate load and update hierarchical and relational data. The system can also be used 
for extending documents with relational and non-relational data and applying updates 
based on these documents to relational database targets. The system can also be used for 
mapping Nested Relational Data to function calls that accept tables as parameters and 
return multiple scalar and table parameters as output. 

pptpp DESCRIPT ION nv THE DRAWTNGS 
Fig. 1 shows a table that is related to a single row of another table. 
Fig. 2 shows the data of Fig. 1, organized as multiple rows in a single 

Fig. 3 shows the data of Fig. 1. organized as multiple tables related by a 

Fia 4 illustrates multiple levels of nested tables contained in one column. 
Fig. 5 illustrates a more general example of multiple levels of nested tables 

contained in more than one column. 

Fig. 6 is a block diagram of a database system according to one 

?0 embodiment of the present invention. 

Fig . 7 illustrates schema relating to nested tables; Fig. 7 A shows input 

tables and Fig. 7B shows an output schema. 

Fig. 8 illustrates a process of grouping values across nested tables. 
Fi' 9 illustrates a process of unnesting data; Fig. 9A shows how a table 
with a nested table would be unnested into a cross-product of the parent table and a child 
(nested) table; Fig. 9B illustrates unnesting into separate tables; Fig. 9C illustrates 

unnesting at multiple levels. 

Fig. 10 illustrates a case where unnesting might produce unintended 



10 

table. 
15 join. 



25 



30 



effects, 
query. 



Fig. 1 1 graphically illustrates an unnesting process and its effects on a 

Fig. 12 illustrates a process of converting a DTD to tables. 
Fig. 13 illustrates the XML encoding of a DTD definition. 
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Fig. 14 illustrates various real-time data flows. 
Fig. 15 illustrates an operation of joining two inputs in a query. 
■ Fig. 1 6 illustrates real-time data flows that use supplementary information. 
Fig. 17 illustrates data flows depending on cached values. 
5 Fig. 18 illustrates branching data flows based on rules. 

Fig. 19 is an illustration of a complex real-time data flow. 

Fig. 20 is an illustration of a GUI for specifying a data flow. 

Fig. 21 is a block diagram of a schema conversion system. 

Figs. 22-26 are tables illustrating various aspects of an NRDM system. 

10 DESCRIPTION OF THE SPFCTF1C EM BODIMENTS 

In a specific embodiment a Nested Relational Data Model (NRDM) is 
designed to support hierarchical and relational components used to represent business 
data. Business documents are typically hierarchical with multiple repeating sets. For 
example, an order contains a set of repeating line items. It may also have a set of 

15 customers associated with it. 

Business documents used to exchange data between software systems 
within an enterprise or between enterprises need to be represented as complex 
hierarchical documents. The industry and the research community use well-known 
representations such as EDI and XML to capture and represent such documents. The 

20 system described herein provides methods for mapping such documents to a Nested 
Relational format, methods for transforming and manipulating of these documents 
represented using the Nested Relational Data Model, converting such documents to 
relational format and mapping them to multiple relational tables, and a method of 
converting the data in a nested relational format back. to an external hierarchical format 

25 such as XML. 

The system provides a method to apply declarative rules to map the 
hierarchical (e.g., XML or EDI) data to relational tables and vice versa; declarative rules 
to enrich hierarchical data with data from other relational or hierarchical sources; 
declarative rules to perform multi-stage transformations. The system allows declarative 
30 transformations to be applied to hierarchical data, and the ability to transparently apply 
rules to heterogeneous databases and files; as well as in the ability to apply multi-stage 
transformations. Delcarative specifications (such as SQL) describe what to do with data, 
as opposed to procedural specifications (such as C++ code) that described how to do it. 



3DOCIO: <WO 0159602A1J_> 



WO 01/59602 PCT/US01/04698 

"Nested data" is data in a table that is related to a single row of another 
table. Sales orders are often presented using nesting: the line items in a sales order are 
related to a single header. For a table of sales order headers, each row includes its own 
table ofline items. An example of this is shown in Fig. 1. Of course, the same data could 
5 be represented without nested tables. For example, the data could be represented as 

multiple rows in a single table as shown in Fig. 2, or as multiple tables related by a join as 
shown in Fig. 3. 

One source of data for a nested table is the result of a query using the 
values in the related row in the parent table. As used herein, "parent table" refers to a 

10 table within which another table is nested and "child table" or "nested table" refers to a 
table that is nested in a column of a parent table. A nested table is said to have a 
relationship with the table within which it is nested and where levels are associated with 
tables, a parent table would have a level that is designated with a number one higher than 
the child tables nested in that parent table. For example, Fig. 4 shows a parent table 10, a 

15 nested (child) table 12 one level below table 10 and nested tables 14(a)-(b) that are nested 
in table 12 and are two levels below table 10. 

Preferably, a unique instance of each nested table exists for each row at 
• each level of a relationship. As illustrated in Fig. 5, each row at each level can have any 
number of columns containing nested tables. 

20 Fig. 6 shows various aspects of a database system 1 00 that handles NRDM 

data. System 100 is shown comprising a metadata mapper 104 that maps DTD 102 
w/hierarchical structures to NRDM schema that are stored in schema storage 106. These 
components are shown as being part of a preprocessing section, with other portions being 
part of a real-time section, but it should be understood that all of the process or none of 

25 the processing might be done in real-time without departing from the essence of the 
invention. Notwithstanding that caveat, the descriptions below reference an example 
wherein DTDs are converted to NRDM schema and stored. and documents are converted 
by system 1 00 in real-time after such conversion. 

One such real-time process involved a document 1 1 0 being passed to an 

30 importer, then to a transformation engine (TE) 114 and an exporter 1 16 to result in a 
document in a new format 1 18 (in some cases, the formats of document 1 10 and 
document 118 might be the same, but some transformation has occurred). Document 110 
is a structured document, such as an XML document, an HTML page, a document having 
other structure, or other structured data object. 
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Importer 1 12 cor verts the document into NRDM data so that TE 1 14 can 
operate on data in the NRDM sp ace, thus simplifying many transform operations, as 
described below. TE 114 accepts data in NRDM format as its input and outputs data in 
NRDM format. Of course, data in NRDM (Nested Relational Data Model) format need 
5 not have nested data (for example, if the input data can be structured such that nesting is 
. not needed). Because TE 114 operates on NRDM structures, the transformations 
performed by TE 1 14 can be expressed simply as a declarative specification, thus greatly 
simplifying the process of transforming complex data. In effect, importer 112 converts a 
hierarchical document into a relational database form to which declarative statements can 
1 0 be applied. 

Exporter 1 16 exports the data in a suitable form, such as XML documents, 
relational tables or flat files. 
Data Flows 

In a graphical interface used to build data flows and/or nested data 

15 structures, such as the ActaWorks™ system developed by Acta, Inc. structures of nested 
data in input and output schemas of sources, targets, and transforms in data flows are 
presented to a designer. An example of an input schema 60 is shown in Fig. 7A and an 
example of an output schema 62 is shown in Fig. 7B. Input schema 60 shows a table A 
that has columns columnl , column2 and a column for a nested table B, which in turn has 

20 columns column4 and columns. Input schema 60 also shows a table Z that has columns 
columnl 1, column 12 and a column for a nested table Y, which in turn has columns 
columnl4 and columnl 5. In Fig. 7A, and others, nested tables appear with a table icon 
paired with a plus sign, which indicates that the object contains columns (a minus sign 
indicates that the object is open and if it has columns, those columns are visible. 

25 In a relational database system (RDS) using a declarative language such as 

SQL, a query transform might take the form of a SELECT statement that is executed by 
the RDS. When working with nested data in an nested relational data model (NRDM) 
system according to some aspects of the present invention, the query can specify 
SELECTS at each level of a relationship defined in the output schema. Thus, while a 

30 SELECT statement might be constrained to include only references to relational data sets, 
a query that includes nested data might include a SELECT statement to define operations 
on each table in the output-each context for the input data set is transformed. 

In such an NRDM system, the FROM clause descriptions and the behavior 
of the query are the same with nested data as with relational data, but the new interface of 
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contexts allows the data flow des.gner to distinguish multiple SELECTS from each other 
within a single query. At any context, the FROM clause can contain any top-level table 
from the input or any table that is a column of a table in the FROM clause of the next 
higher context. 

When rows of one table (a child table) are nested inside another table (a 
parent table), the data set produced in the nested table is the result of a query against the 
first table using the related values from the second table. For example, if sales 
information is available as a header table and a line-item table, the sales information can 
be organized as a parent table of header information and a child table containing lme-.tem 
data here the line-items are nested under the header table. The line items for a single row 
of the header table are equal to the results of a query including the order number, as rmght 
be found using the following statement: 
erI FrT * from Lir.eltems 

SELECT FROM_ . 0rderNo = Lin e Items . OrderNo 

Correlation can be used to construct a nested table from columns from a 
hioher-level context. In a nested-relational model, the columns in a nested table are 
implicitly related to the columns in the parent row. To take advantage of this 
relationship, the parent table can be used in the construction of the nested table. The 
higher-level column is a correlated column. Including a correlated column in a nested 
table may serve at .east two purposes: 1) the correlated column is a key in the parent table 
and 2) makint the correlated column an attribute in the parent table. Including the key in 
the nested table allows for the maintenance of you a relationship between the two tables 
after converting them from the nested data model to a relational model. Including the 
attribute in the nested table allows for the use of the attribute to simplify correlated 

queries against the nested data. 

Correlated columns can include columns from the parent table and any 
other tables in the FROM clause of the parent table. If the correlated column comes from 
a table other than the immediate parent, the data in the nested table includes only the rows 
that match both the related values in the current row of the parent table and the value of 

the correlated column. 

Values can be grouped across nested tables. Thus, when a statement 
includes a Group By clause for a table with a nested table, the grouping operation 
combines the nested tables for each group. For example, to assemble all the line items 
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included in all the orders for each state from a set of orders, the designer would set the 
Group By clause in the top-level of the data set to the state column (Order.State) and 
create an output table that includes State column (set to Order.State) and Lineltems nested 
table. The result of such an operation might result with the table shown in Fig. 8. The 
5 result is a set of rows (one for each state) that has the State column and the Lineltems 
nested table that contains all the Lineltems for all the orders for that state. 

Nested data can also be unnested. When loading a data set that contains 
nested tables into a relational (non-nested) target, the nested rows will be unnested. Take, 
for example, a message containing a sales order that uses a nested table to define the 
10 relationship between the order header and the order line items. To load the data into 
relational tables, the multi-level must be unnested. Unnesting a table produces a 
cross-product of the top-level table (parent) and the nested table (child), as shown in Fig. 
9A. Different columns from different nesting levels might be loaded into different tables. 
A sales order, for example, may be flattened so that the order number is maintained 
15 separately with each line item and the header and line item information loaded into 
separate tables, as shown in Fig. 9B. 

Any number of nested tables can be unnested at any depth. No matter how 
many levels are involved, the result of unnesting tables is a cross product of the parent 
and child tables. When more than one level of unnesting occurs, the inner-most child is , 
20 unnested first, then the result— the cross product of the parent and the inner-most child— is 
then unnested from its parent, and so on to the top-level table, creating the result shown in 
Fig.9C. 

Unnesting all tables (cross product of all data) may not produce the results 
intended. For example, if multiple customer values are included in an order, such as 
25 ship-to and bill-to addresses, flattening a sales order by unnesting customer and line item 
tables produces rows of data that may not be useful for processing, the order. This is 
illustrated in Fig. 10. Using the GUI, the specification of the data flow is shown in Fig. 

11. • • 

A DTD (document type definition) describes the data schema of an XML 
30 message or file. Real-time data flows read and write XML messages based on a specified 
DTD format. One DTD can describe multiple XML sources or targets. Batch data flows 
can read and write data to files based on a specified DTD format. 
; . DTDs can be imported into the NRDM system, either directly or by 

importing an XML document that contains a DTD. During import, the NRDM system 
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converts the structure defined in the DTD into an internal nested-relational data model. 
Elements below the root-level that contain other elements become nested tables and 
elements that do not contain other elements become columns. Attributes become 
columns in the corresponding element's schema. 
5 The NRDM system applies the following rules to convert the DTD to 

tables, columns, and nested tables: 

- Any element that contains PCDATA only and no attributes becomes a column. 

- Any element with attributes or other elements (or in mixed format) becomes a table. 

- An attribute becomes a column in the table corresponding to the element it supports. 
1 0 - Any occurrence of choice operators is converted to strict ordering. 

- Any occurrence of optional operators is converted to strict ordering. 

_ Any occurrence of Q* or () + becomes a table with an internally generated name-an 
implicit table. 

After these rules have been applied, the NRDM system optimizes the 
1 5 format using two more rules, except where doing so would allow more than one row at 
the root element: 

- If an implicit table contains one and only one nested table, then the implicit table can 
be eliminated and the nested table can be attached directly to the parent of the implicit 
table. For example, the SalesOrder element might be defined as follows in the DTD: 

20 

< ! ELEMENT SalesOrder (Header, Linelcems* ) > 

When converted, the Lineltems element with the zero or more operator would 
become an implicit table under the SalesOrder table. The Lineltems element itself 
25 would be a nested table under the implicit table, as shown in Fig. 12 A. Because 

the implicit table contains one and only one nested table, the format would be 
optimized to remove the implicit table, as shown in Fig. 12B. 

- If a nested table contains one and only one implicit table, then the implicit table can 
be eliminated and its columns placed directly under the nested table. For example, the 

30 nested table Lineltems might be defined as follows in the DTD: 

< ! ELEMENT Lineltems (ItemNum, Quancicy') * > 

When converted, the grouping with the zero or more operator would 
35 become an implicit table under the Lineltems table. The ItemNum and Quantity elements 
would become columns under the implicit table, as shown in Fig. 12C. Because the 
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Lineltems nested table contained one and only one implicit table, it would be optimized 
to remove the implicit table, as shown in Fig. 12D. 

If the DTD contains an element that uses an ancestor element in its 
definition, the definition of the ancestor can be expanded for a fixed number of levels. 
5 For example, given the following definition of element "A": 

A: B, C 
B: E, F 
F: A, H 

10 The system produces a table for the element "F" that includes an expansion of "A." In this 
second expansion of "A," "F" appears again, and so on until the fixed number of levels. 
In the final expansion of "A," the element "F" appears with only the element "H" in its 
definition. 
P pal-Time Sources 

2 5 ' a real-time source in a real-time data flow determines the message that the 

real-time data flow will process. The source object represents the schema of the expected 
messages. Messages received are fit to the schema. Real-time data flows accept 
real-time source types such as Extensible Markup Language formatted (XML) messages 
or intermediate documents, such as IDocs published from an SAP R/3 application server. 

20 The format of the XML message is specified by a document type 

definition (DTD). The DTD describes the schema of data contained in the message and 
the relationships among the elements in the data. For a message that contains information 
to place a sales order-order header, customer, and line items-the corresponding DTD 
includes the order structure and the relationship between data, as shown by the example 

25 in Fig. 13. 

The following examples provide a high-level description of how real-time 
data flows address typical real-time scenarios. Fig. 14A shows a real-time data flow as 
might be used to load transactions into an ERP system, such as SAP R/3. A real-time 
data flow can receive a transaction from an electronic commerce application and load it to 
30 an ERP system. Using a query transform, one can include values from a data warehouse 
to supplement the transaction before applying it against the ERP system. 

Fig. 14B shows a real-time data flow for collecting ERP data into a 
warehouse. Real-time data flows can receive messages from the ERP through IDocs. 
Each IDoc contains a transaction that the real-time data flow can load into a data 
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warehouse or a data mart. In this way, IDocs can be used to keep the data in a warehouse 
current. 

Fig. 14C shows a real-time data flow for retrieving values from a cache or 
and ERP system. This allows for real-time data flows that use values from a data 

5 warehouse to determine whether or not to query the ERP system directly. 
Sup plementary Sources 

When more data is needed than what is provided in the content of a 
message to complete the message processing, supplementary sources might be used. For 
example, processing a message that contains a sales order from an electronic commerce 

10 application that contains the customer name might require that, when the order is applied 
against your ERP system, more detailed customer information is needed. Inside the 
real-time data flow, the message is supplemented with the customer information to 
produce the complete document to send to the ERP system. The supplementary 
information may come from the ERP system itself or from a cache containing the same 

15 information cached. Examples of such data flows are shown in Figs. 15, 16A, 16B. 

Tables and files (including XML files) as sources in real-time data flows 
can provide this supplementary information. The real-time data flow extracts data from 
the supplementary source as indicated by the logic defined in the real-time data flow. 

Tables or files that are used as sources and have a cache option allow for 

20 the data extracted to be stored in memory until the data flow processing is complete. In 
real-time data flows, sources should not be cached unless the data being cached is small 
and is unlikely to be updated in the life of the real-time data flow. 

In batch data flows, caching can improve the performance of data flow 
processing by reducing the number of times a set of data is read from the database or file 

25 source. In real-time data flows, however, the improvement in performance provided by 
caching is minimized by the likelihood that the real-time data flow reads only a small 
amount of data from the source for any given message. In addition, because the real-time 
data flow reloads cached data only when an access server shuts it down and restarts it, 
cached data may become stale in memory. 

30 Tables can be sources in real-time data flows after their metadata is 

imported into the repository. When the real-time data flow starts, it opens a connection to 
the source database. This connection remains open as long as the real-time data flow is 
running. If a table is included in a join with a real-time source, the data set from the 
real-time source is included as the outer loop of the join. 
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R/3 tables can be sources in real-time data flows after their metadata is 
imported into the repository. When the real-time data flow performs a query against the 
R/3 table, it executes an R/3 function call to extract the data through the SAP R/3 
application server. This method of extracting data from SAP R/3 is particularly well 
5 suited to extracting a small amount of specific data (on the order of 1 to 10 rows) in a 
real-time system, but might not work well as a substitute to using R/3 data flows to 
produce ABAP programs to extract large amounts of data in a batch system. 

Data from XML files can be used as sources in real-time data flows, if a 
DTD that describes the data in the file is imported. 
10 Supplementing Message Data 

The data included in messages from real-time sources may not map 
exactly to requirements for processing or storing the information. If not, steps can be 
defined in the real-time data flow to supplement the message information. One technique 
for supplementing the data in a real-time source includes these steps in a real-time data 
15 flow: 

1 . Include a table or file as a source. In addition to the real-time source, include the 
files or tables that supply the supplementary information. 

2. Use a query to extract needed data from the table or file. Use the data in the 
real-time source to find the needed supplementary data. A join expression can be 

20 used in the query so that only the specific values required from the supplementary 

source are extracted. 

Fig. 16A shows an example where a message includes sales order information with the 
ultimate goal to return order status. In this case, the business logic uses the customer 
number and priority rating to determine the level of status to return. The message 
25 includes only the customer name and the order number. The real-time data flow is then 
defined to retrieve the customer number and rating from other sources before determining 
the order status. 

A real-time data flow might include logic to determine when responses can 
be generated from data in a cache and when they must be generated from data in an ERP 
30 system. One technique for constructing this logic includes the steps in the real-time data 
flow (illustrated in Figs. 17-20): 
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1. Determine the rule for when to access the cache and when to access the ERP 
system. 

2. Compare data from the real-time source with the rule. 

3. Define each path that could result from the outcome. Consider the case where the 
5 rule indicates ERP access, but the ERP system is not currently available. 

4. Merge the results from each path into a single data set. 

5. Route the single result to the real-time target. 

This example describes a section of a real-time data flow that processes a new sales order. 
The section is responsible for checking the inventory available of the ordered products-it 
10 finds an answer to the question, "is. there enough inventory on hand to fill this order?" The 
rule controlling access to the ERP system indicates that the inventory (Inv) must be more 
than a pre-determined value (IMargin) greater than the ordered quantity (Qty) to consider 
the cached inventory value acceptable. The comparison is made for each line item in the 
order. 

15 Fig. 18 illustrates a branch in the data flow based on a rule. An XML 

source contains the entire sales order, yet the data flow compares values for line items 
inside the sales order. The XML target that-ultimately returns a response requires a single 
row at the top-most level. Because this data flow needs to be able to determine inventory 
values for multiple line items, the structure of the output requires the inventory 

20 information to be nested. The input is already nested under the sales order; the output can 
use the same convention. In addition, the output needs to include some way to indicate 
that the inventory is or is not available. 

Fig. 19 illustrates several ways to return values from the ERP. For 
example, a lookup function or a join on the specific table could be used in the ERP 

25 system. The example uses a join so that the processing can be performed by the ERP 
system rather than the NRDM system. As in the previous join, if a value might not be 
returned by the join, an outer join can be defined so that the line item row is not lost. 

Fig. 20 illustrates a GUI used to specify transformations and a specific 
transformation specified with that GUI. 

30 Fig. 21 is a block diagram of a schema converter. In the example shown, 

an NRDM schema is converted to a DTD schema. 
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Qther Uses 

One of the advantages of operating a transformation engine on NRDM 
data structures, as described above, is that the transformation engine can operate on 
hierarchical data as if it were a relational table. Thus, hierarchical documents, such as 
5 XML documents can be operated on using declarative statements, such as SQL, 

regardless of how many levels of hierarchy are present. One method of effecting such a 
benefit is to nest child tables into columns of parent tables and use a transformation 
engine that handles NRDM data as its input and as its output. The transformation engine 
can be sandwiched between an importer that converts hierarchical documents into NRDM 
10 data structures and an exporter that generates hierarchical documents from NRDM data 
structures. 

There are various ways to implement NRDM data structures. For 
example, conventional relational tables can be used, where a column of the parent table 
stores a pointer to a child table. A separate child table could exist for each row of the 

1 5 parent table that does not have a NULL value for that row and column, or where the child 
tables for each row have corresponding formats, the data representing the child tables 
could be implemented as subtables of one child data-holding table. Regardless of the 
underlying structure, the transformation engine deals with the data structures as nested 
tables and applies declarative statements accordingly. 

20 Other aspects of the system described herein might find uses apart from 

NRDM data structures and systems. For example, requests received from applications for 
data processing and/or transformation might operate on nested tables, but might also 
operate on conventional relational tables. 

The applications often provide application programming interfaces (APIs) 

25 through with other programs interact with the application. Often, the designer of a 

program that interacts with the application must know the interfaces and correctly specify 
the parameters of a particular function call. However, some applications might accept as 
an input NRDM data or a hierarchical document. In some cases, the application interface 
could be such that the semantics of the function call are in a document submitted as a 

30 parameter and then one generic interface is all that is needed to call the application. 
Exam ple Implementation 

An example of an NRDM system according to various aspects of the 
present invention will now be described. It should be understood that the invention is not 
limited to this specific example. The example system supports hierarchical data models 
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such as IDoc and XML and provides for a hierarchical structure to support a hierarchical 
data model represented as a single row that contains scalar columns and repeating 
group(s) of embedded rows forming nested table(s), where nesting can be arbitrarily deep 
and an implicit relationship is not required between embedded rows and parent (i.e., the 
5 children rows do not need to contain a key to join it back to the parent row). 

The NRDM system can capture an entire business transaction in a single 
hierarchical structure and transform a hierarchical structure as a single entity using 
relation operators that can be applied at any level of the hierarchy. A hierarchical 
structure when applied as a single database transaction can be loaded to a set of tables 
10 belonging to a single datastore. 
Data Model 

In NRDM, the first normal form requirement that a column be a scalar is 
removed. In NRDM, a column can be a scalar or a relation value, which we refer to as a 
nested table. A scalar column definition has a name, type (including length, precision, 
15 domain info, etc.) and, at run time, contains either a value or a NULL indicator. A nested 
table definition has a name, schema (e.g., a list of column definitions) and, at run time, 
contains either one or more rows of the schema specified in the nested table definition or 
an empty table indicator (e.g., ISEMPTY). 
DDI, Operations 

20 AL_NESTED_TABLE is used below to define a nested table for DDL 

operations. For example, creating a view with nested table might be done by the 
following statements: 

CREATE VIEW VI ( 

ORD2R_ID INT, 
25 PROD_INFO AL_N*:STED_TABLE ( 

PROD_ID .INT, 
QTY INT, 

VENDOR_INFO AL_NE ST EDITABLE (VNDR_ID CHAR ( 5 ) , 

VNDR_CITY CHAR (65)) 

30 ■ >. 

CID INT , . 

CCITY CHAR (65) 

) ; 

Fig. 22 illustrates a data table that might result for the above statements. 
35 DML Operations 

Relational operations such as select, project, etc. can be used on NRDM 
data. Nested relations can be accessed as regular relations in the context (scope) of their 
parents. In other words, wherever a scalar column is used, a nested table can be used. If 
a parent table is used in a FROM clause, all the nested tables can be used in the SELECT 
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and WHERE clauses and nested subqueries as full-fledged tables. If two parent tables 
having a same name for a nested table are used in a relational operation, the nested tables 
should be qualified with the parent tables. 

Nested subqueries allow for accessing and transforming data inside nested 
relations. Nested subqueries can transform data in nested relations, nest, unnest and join 
data in nested relations with the data in its parents and handle operations such as 
ISEMPTY, AL_NEST, AL_NEST_SET and AL_UNNEST for NRDM data. The 
AL NEST operator creates partitions based on the formation of equivalence classes to 
generate nested tables. It operates on a row basis. AL_NEST_SET operator is similar to 
AL_NEST but operates on a set basis. The ALJJNNEST operator transforms a relation 
into one, which is less deeply nested by concatenating each tuple in the relation being 
unnested to the remaining attributes in the relation. 

The AL_NEST operator creates partitions based on the formation of 
equivalence classes to generate nested tables. Two tuples are equivalent if they have the 
same values for attributes, which are not being nested. AL_NEST operates on a row 
basis. Nesting can be done in two ways using a user interface (such as the GUI described 
above). A nested table can be dragged from the input to the output of a query transform 
and placed at the same or deeper level, or a nested schema can be created and columns 
from the input can be dragged and dropped into the newly created schema. 

An explicit FROM clause might be needed where two views are coming 
into a query transform, and columns are selected from only one the views. The generated 
language is to select from both the views. For nesting of two input views containing only 
scalar columns, selecting from the both the views at the same level might not be desired. 
The following example illustrates this. Given a flat view VI as: 

CREATE VIEW ORDERS (ORDER_ID INT, PROD_ID INT , QTY INT, 
CID INT, CCITY VARCHAR (65) ) 

CREATE VIEW VENDORS (PROD_ID INT, VNDR_ID VARCHAR (5), ' 

VNDR_CITY VARCHAR{65)) 

the table of flat relations shown in Fig. 23 results. A two level nesting to include vendor 
information using a JOIN can be demonstrated by the following example: 

CREATE VIEW V2 (ORDER_ID INT, 

PROD_INFO AL_NESTED_TABLE (PROD_ID INT, 

QTY INT, 
VENDOR_INFO 
AL_NZSTED_TABLE ( 

VNDR_ID CHAR (5), 
VNDR_CITY CHAR (65) 
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) 

) , 

CID, 
CCITY 
) 

AS SELECT ORDER_ID, 

AL_NEST (CREATE VIEW PROD_INFO (PROD_ID INT , QTY INT) 
AS SELECT PROD_ID, 
QTY, 

. . AL_NEST (CREATE VIEW VENTX)R_INFO 

(VNDR_ID CHAR (5), 
VNDR_CITY CHAR(65)) AS 
SELECT VNDR_ID, VNDR_CITY 
FROM VENDORS 

WHERE VENDORS - PROD_ID = LI . PROD_ID 
) 

AS VENDOR_INFO 
FROM ORDERS LI 

WHERE LI . ORDER_ID = LO . OR-DER_ID AND 
LI . CID = LO-CID AND 

LI . CCITY « LO. CCITY 

) 

AS PROD_INFO, 
CID , 

25 CCITY 

FROM ORDERS LO 

The explicit FROM clause prevents the usage of the VENDORS in the 
outermost select. This may produce a nested table as shown in Fig. 22, except with three 
30' rows with ORDERJD equal to 100, two rows with ORJDERJD equal to 200 and one 
row with ORDER JD = 300, because AL_NEST operates on a row basis, which can 

produce duplicates. 

The AL_NEST operator may be used to perform nesting on a set of rows 
also. If there is a GROUP BY, the set formed by the GROUP BY is used. If there are 
35 aggregate functions and a GROUP BY is specified, the set formed by the GROUP BY is 
used. If there are aggregate functions and a GROUP BY is not specified, then the default 
grouping is the entire table. All nested tables in the set operated by the AL_NEST may 
be merged. 

Using AL NEST SET with an A ggregate Function 
40 This operation may take in a view with nested tables and produce a single 

row, which has count of ORDER_ID's and the merge of all nested tables: 

CREATE VIEW V2 (NUM_ORDER3 INT, 

PROD INFO AL_NESTED__TABLE (PROD_ID INT, 
~ ~ QTY INT 

45 > 

) 

AS SELECT COUNT <ORDER_ID) , 

AL NEST SET (CREATE VIEW PROD_INFO (PROD_ID INT, 

QTY INT) AS 
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SELECT PROD_ID, QTY 
FROM PROD_INFO 

j 

AS PROD_INFO, 

5 

FROM VI 

Such a query might produce the table shown in Fig. 24. If the nested table(s) SELECT(S) 
have WHERE clauses, the nested table(s) might first be merged and the filters applied to 
the merged table(s). 
10 AT. IJNNEST 

The ALJJNNEST operator transforms a relation into one that is less 
deeply nested by concatenating each tuple in the relation being unnested to the remaining 
attributes in the relation. To unnest the vendor information from the nested table in Fig. 
22, the following ATL might be defined: 

15 CREATE VIEW V2 (ORDER_ID INT, 

PROD_INFO AL_NESTED_TABLE (PROD_ID INT, 

QTY INT , 

VNDR_ID CHAR(S))) 

AS SELECT ORDER_ID, 

20 AL_NEST (CREATE VIEW PR0D_INFO <PROD_ID INT, QTY INT) AS 

SELECT Vl.PROD_INFO.PROD__ID, 
VI . PROD_INFO . QTY , 

AL_UNNEST (CREATE VIEW VDR_INFO 
(WDR_ID INT) AS 

~c SELECT 

ZD VI . PROD_INFO . VEND OR_ INFO . VNDR_ID 

FROM VI . PROD_INFO . VENDOR_INFO) 

FROM VI . PROD_INFO) 
AS PROD_INFO 

30 FROM VI m 

WHERE clauses can be applied in the SELECT for unnestmg by drilling 
into the nested table which would produce a query transform, specifying the condition 
there, as shown in the following example: 

CREATE VIEW V2 (VNDR_ID CHAR<5), VNDR_CITY CHAR(65)) 
<ic * AS SELECT DISTINCT AL_T7NNEST (CREATE VIEW 

JJ ^ '" UNEST1 (VNDR_ID CHAR ( 5 ) , 

VNDR_CITY CHAR(65)) 
AS SELECT 

AL_0NNE ST ( CREATE . V I EW 
d() UNEST2 (VNDR^ID CHAR < 5 ) , 

CHAR(65)) 
AS SELECT VNDR_ID, VNDR_CITY 
FROM VENDOR_INFO) 
45 FROM PROD_INFO 

FROM VI 

Project 

An example of a simple projection from one hierarchical structure to 
50 another would be: 
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CREATE VIEW V2 ( 

ORDER_ID INT, 

PROD_INFO AL_tfESTEDJTABLE (PROD_ID INT, QTY INT) . 
) 

5 AS SELECT ORDER ID, 

AL_NEST (CREATE VIEW PROD_ INFO ( PROD_ID INT, QTY INT) 
AS~SELECT VI . PROD_INFO . PROD_ID , VI . PROD_INFO . QTY 
FROM VI .PROD_INFO) 
AS PROD_INFO 

10 FROM VI 

The qualifier V1.PRODJNFO in the nested relation is not really needed; the nested 
query could have been written using just PRODJNFO. The result might be the table 
shown in Fig. 25. 
Select 

!5 Filter conditions can be applied at various levels. Consider the example of 

view VI (Fig. 22) that has three levels of nesting. A filter on the nested relation 
PROD_INFO might be implemented as follows: 

CREATE VIEW V3 (ORDER_ID INT, 

PROD_INFO AL_NESTED_TABLE<PROD_ID INT, QTY INT} 

20 > 

AS SELECT 

ORDER_ID, 

AL_NEST (CREATE VIEW PROD_INFO ( PROD_ID INT, QTY INT) 

AS SELECT V1.PROD_INFO.PROD_ID, 
25 VI . PROD_INFO . QTY 

FROM VI . PROD_INFO 
WHERE VI .PROD_ INFO. QTY > 50) 

AS PROD INFO 



30 



FROM VI 



This may select all the rows from VI, but for the nested table PROD_INFO, only those 
rows are chosen where the quantity ordered QTY is greater than 50, resulting in the table 
shown in Fig. 26. 

Alternate Support For Filters In The WHERE Clause 

35 For. a nested table to be used in a WHERE clause sub-query, support 

within a WHERE clause should be available. If such support is not available, it can be 
overcome by using two stages and the ISEMPTY operator for nested tables. Nested 
tables can be used in a WHERE clause only with the ISEMPTY operator. The following 
example illustrates the use, selecting all the rows from VI that have ORDER JD greater 

40 than 100 and that have at least one product with a quantity ordered greater than 50. 

CREATE VIEW V3 (ORDER_ID INT, 

PROD_INFO AL_NESTED_TABLE(?ROD_ID INT , QTY INT), 
TEMP_PROD__INFO AL_N*ESTED__TABLE { PROD_ID INT, QTY 

INT) 

45 ) 

AS SELECT 

ORDER_ID, 
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AL_NE ST (CREATE VIEW FROD_INFO (PR0D_ID INT, QTY INT) 
AS SELECT VX.PROD — INFO.PROD_ ID, 

Vl.PROD_INFO.QTY 
FROM Vl.PROD_INFO ■ 
) 

AS PROD_INFO, 

AL_NE ST (CREATE VIEW PROD_INFO < PROD_ID INT, QTY INT) 
AS SELECT Vl.PROD_INFO.PROD_ID, 
Vl.PROD_INFO.QTY 

FROM VI . PROD_INFO 

WHERE VI. PROD_INFO . QTY > 50) 
AS TEMP_PROD_INFO 

FROM VI WHERE V!.ORDER_ID > 10 0 



CREATE VIEW V4 (ORDER_ID INT , 

PROD_INFO AL_NESTED_TABLE <PROE>_ID INT, QTY INT) 

) 

AS SELECT 

ORDER_ID, 

AL__NEST< CREATE VIEW PROD_INFO ( PROD_ID INT, QTY INT) 
AS SELECT Vl.PROD_INFO.PROD_ID, 
VI . PROD_INFO . QTY 

FROM VI . PROD_INFO 

) 

AS PROD_INFO 
FROM V3 WHERE ! ISEMPTY (TEMP_PROD_INFO ) 

loin 

Nested relations can be joined with any other relations. An example is 

given below: 

CREATE VIEW ORDERS (ORDERID INT , PRODUCTS 

AL NESTED TABLE (PRODID INT, PRODNAME VARCHAR (10))) 



CRESTS VIEW VENDORS (PRODID INT, VENDOR ID INT, 

VENDORNAME VARCHAR (10)); 

CREATE VIEW ORDERS_WITH_VENDORS (ORDERID INT, 

PRODUCTS AL_NESTED_TABLE (PRODID INT, 

PRODNAME VARCHAR (10), 
VENDOR ID INT) 

AS 

SELECT ORDERID , 

AL NEST (CREATE VIEW PRODUCTS ( PRODID INT, 

PRODNAME VARCHAR (10), 
VENDORID INT) 
AS SELECT PRODID, PRODNAME , VENDORID 
FROM PRODUCTS , VENDORS 

WHERE PRODUCTS . PRODID = VENDORS . PRODID) 

AS PRODUCTS 
FROM ORDERS GROUP BY ORDERID 



.0159302 A lj_> 



20 



WO 01/59602 PCT/US01/04698 

TsWrH Table Transform 

A system transform is available that takes in a flat view and produces a 
singleton that has a N integer scalar column with a value 1, and a nested table containing 
the input view. 
5 Tables as Parameters 

Tables can be used as parameters for imported functions. Given a function 
get_orders with an input parameter customer Jd and an output parameter orders: 

CREATE FUNCTION get_orders (cust_id int, 

orders al_NE S T ED JT ABLE (order_id int, ...) 
iq . OUTPUT, 

cust^info AXi__NE5TED__TABIiE (cust^name, . . ) 
OUTPUT) ; 



Get orders for each customer by calling the orders function: 

iers (cus 
or 

int, ..-) ) 



ic CREATE VIEW cus tomer_order s (customer_id int, 

1 " v ~" ~ orders AL__NESTED_TABLE (order_id 



AS SELECT customer_id, 

AL. 1TEST (get__orders ( customer^id). :: orders) 
20 AS orders 

FROM customers; 

if the function has multiple tables as outputs, and all or some of them are required, then 
the function has to be invoked multiple times: once for each output. 

CREATE VIEW customer^orders (customer_id int, 
" "~ *~ cust_info AL_NESTED_TABLE (cust_name, . . ) , 

orders AL_KE S ted_TAB LE (order_id 
int, ...) ) 

AS SELECT customer_id, 

AI>_NEST (get_orders (customer^id) : : cus t_inf o) AS 

30 cust info 

. ~ • • AL_NEST (get_orders { customer_id) : .-orders) AS orders 

FROM customers; 

As an optimization, the system could invoke the function only once and use those results 
for different instances within the query transform. For mapping a function returning 

35 table, a user would create a nested table column and map the nested table column to the 
function returning a table. The schema of the nested table may then be identical to the 
schema returned by the function. This is a concept of a "generated table". The schema 
definition of generated table cannot be modified, and it should disappear when the 
function is removed from the mapping. It should be represented differently in the UI so 

40 that a user can distinguish between a generated table and a non-generated table. 
Hierarchical File Reader 

A hierarchical file reader reads data generated by data flows that have 
functions that return tables. There are two main alternatives: model the file reader as an 
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XML file reader or model the f le reader using a proprietary format to represent 
hierarchical data. 

Fffect of NRDM on System Trans forms 

System transforms such as Table_Comparison, HierarchyJFlattening, etc. 

5 accept only rows with scalar columns. 

Table Comparison : The output schema of the table comparison transform 
is a generated schema and is same as the schema of the table being compared against. 
This transform may silently ignore columns that are nested tables. 

History Preservine : The output schema of the history preserving transform 
10 is same as the input schema, and this transform may preserve history only scalar columns 
and may act as pass through for columns that are nested tables. 

Effective Date : The transform may act as pass through for columns that are 

nested tables. 

Kev Generation : The output schema of the key generation transform is 
1 5 same as the input schema, and this transform may act as pass through for columns that are 
nested tables. 

Map Operation : The output schema of the map operation transform is 
same as the input schema, and this transform may not allow operations to be mapped for 
columns as nested tables and may act as pass through for them. 
20 Hierarchy Flattenine : Columns as nested tables cannot be a parent or child 

column of a hierarchy, but they can be dragged and dropped attribute columns and thus 
can appear in the output schema. 

Pivot ; The output schema of the hierarchy flattening transform is a 
generated schema and columns, as nested tables rnay be ignored. 
25 A Case Study 

A case study of a Sales Order IDoc using NRDM was performed. The 
IDoc was captured in a NRDM and perform transformations, to arrive at the same result 
as if the NRDM was not used, but with simplified specification of the transformations. 

An IDoc is divided into a control record, data records and a status record. 
30 Each control record and status record has numerous fields. For our purpose of validating 
the NRDM, we treated control records and status records as single varchar columns. The 
ATL to represent a Sales Order (some of the columns associated with nested tables might 
be omitted in the listing) is: 
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10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



CREATE VIEW VI ( 

CONTROL_RECORD VARCHAR (100) , 
STATUS_RECORD VARCHAR (10 0) , 
E2CMCCO AL_NESTED_TABLE ( 

ZEITP VARCHAR (2), .. , 
E2CVBUK AL_NESTEDJTABLB ( 

SUPKZ VARCHAR (1), . ., 
E2CVBAK AL_HESTED_TASLE ( 

SUPKZ VARCHAR (1) 



E2CVBKO AL_NEST£D_TABLE { 

SUPKZ VARCHAR (1) 

) , 

E2CVBPO AL_NBSTED_TABLE ( 

SUPKZ VARCHAR 

(1) , 

) , 

E2CVBAP AL_NESTED_TABLE ( 

SUPKZ VARCHAR (D 
E2CVBA2 



AL_NESTED_TABLE ( 



AL_NE STED_TABLE ( 



AL NESTED TABLE ( 



SUPKZ 



VARCHAR (1) , 

) , 



AL NESTED TABLE ( 



AL__NESTED_TABLE ( 



AX»__NE S TED_T AB L E ( 



AL_NESTED_TABLE ( 



AL__NE S TEDJT AB LE ( 



AL_NESTED_TABLE ( 



E2CVBUP 

SUPKZ 
VARCHAR ( 1 ) , 
) , 

E2CVBPF 

SUPKZ 
VARCHAR ( 1 ) 
) , 

E2CVBKD 



SUPKZ 
VARCHAR (1) , 
> , 

E2CKONV 

SUPKZ 
VARCHAR ( 1 ) , 
) , 

E2CVBPA 



SUPKZ 
VARCHAR ( 1 ) , 
) , 

E2CVBFA 

" SUPKZ 
VARCHAR (1) , 
) , 

E2CFPLT 

SUPKZ 
VARCHAR (1) , 
) , 

E2CVBEP 
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) # VI 



) # E2CMCC0 



) , # E2CVBAP 



SUPK2 
VARCHAR U) , 
) , 



) , # E2CVBAK 
) , # E2CVBUK 



10 The ATL corresponding to the population of the sales order fact table from 

the above view may be (with some columns omitted for illustration purposes): 



15 



20 



25 



30 



35 



40 



# VBAK. VBELN 

# VBAK. KUNNR 

# VBAP.PPSNR 

# V3AP.ERDAT 

# VBPA. KUNNR 

# VBUP.LFGSA 



CREATE VIEW V2 ( SO_NUM, 
• S0LDJT0 , 

LINE_ITEM_ID, 
CREATE_DATE # 
SHIPJTO, 
DELIVERY_STATUS 
) . 

AS SELECT AL__UNNEST 

(SELECT ALJONNEST 

(SELECT ALJONNEST 

(SELECT VBELN , KUNNR, 

AL_tJNNEST (SELECT POSNR, ERDAT, 

AL_UN*TEST (SELECT KUNNR FROM 

E2CV3PA 

WHERE PARVW - 

» > ) 

AL_ITNNEST (SELECT LFGSA FROM 

E2CVBUP) 

FROM E2CVBAP 
) 

FROM VI . E2CMCC0 . E2CVBUK . E2CVBAK 
) 

FROM VI . E2CMCC0 . E2CVBUK 



) 

FROM VI 



) 

FROM VI . E2CMCC0 
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WHAT T£ CLAIMED IS: : 

1 1. An apparatus for processing data representable in a hierarchical form, 

2 the apparatus comprising: 

3 an importer having inputs to receive a schema and a structured document from a data 

4 source, wherein the importer outputs a first nested relational data model 

5 (NRDM) data structure representing the structured document according to the 

6 received schema; 

7 an transformation engine that is capable of transforming the first NRDM data 

8 structure output by the importer into a second NRDM data structure according to 

9 a declarative specification of a transform; and 

10 an exporter having an input to receive the second NRDM data structure, wherein the 

j J exporter outputs a transformed hierarchical document in a data structure other 

j 2 than an NRDM data structure in a form suitable for a data target. 

j 3 2. The apparatus of claim 1, further comprising means for converting 

14 relational data to an NRDM data structure by vertically partitioning a relation and nesting 

15 parts of the relational data as a nested table. 

16 3. The apparatus of claim 1, further comprising means for converting 

1 7 nested relational data to relational data by unnesting the nested tables using a 

1 8 cross-product between a parent row and a child subtable. 

19 4. The apparatus of claim 1, further comprising means for performing a 

20 grouping operation on a nested table that generates a resulting nested table containing a 

21 union of all the nested tables grouped by the operation. 

22 5. The apparatus of claim 1, further comprising means for performing 

23 multi-step transformations, wherein an input to a transformation is results of a previous 

24 transformation, a data source, or both. 

25 6. The apparatus of claim I, wherein the transformation engine operates on 

26 rules that are applied to data independent of data format. 

27 7. The apparatus of claim 1, wherein the exported is adapted to output one 

28 or more of an XML file, a relational table or a flat file. 

25 
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29 8. A metadata r lapper comprising: 

30 an input for receiving a document description for hierarchical documents; and 

3 1 an output for outputting an NRDM data structure representing the document 

32 description. 

33 9. An apparatus for transforming data representable in a hierarchical form, 

34 the apparatus comprising: 

35 an importer having inputs to receive a schema and a structured document from a data 

36 source, from a data transformer, or from both, wherein the importer outputs a 

37 first nested relational data model (NRDM) data structure representing the 

38 structured document according to the received schema; 

39 an transformation engine that is capable of transforming the first NRDM data 

40 structure output by the importer into a second NRDM data structure according to 

4 1 a declarative specification of a transform; and 

42 an exporter having an input to receive the second NRDM data structure, wherein the 

43 exporter outputs a transformed hierarchical document in a data structure other 

44 than an NRDM data structure in a form suitable for a data target. 

45 1 0. A method for providing data to an application through a data platform 

46 in a computer system in response to request from the application, the method comprising: 

47 accepting declarative rules for accessing the data from data sources and declarative 
4g rules for transforming the data into a format requested by the application; 

49 mapping relational and non-relational data sources to an NRDM data structure; 

• 50 interpreting a request; 

" 51 retrieving data from the data sources; 

52 transforming the data according to the declarative rules; and 

53 returning the transformed data to the application. 

54 11. The method of claim 10, wherein requests are processed as messages 

55 and request messages contain sufficient information to drive data extraction into a 

56 data-oriented interface. 

57 12. The method of claim 10, wherein the requests are application 

58 programming interface function calls. 
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59 13. A method for updating a plurality of data targets from a message, 

60 comprising: 

61 making an update request through a data-oriented interface; 

62 specifying declarative rules for updating the data targets; 

63 importing metadata that maps relational and non-relational data targets to NRDM 

64 data structures; 

65 interpreting incoming update requests; 

66 transforming the data according to the declarative rules; and 

67 updating the data targets. 

14. The method of claim 13, further comprising: 

69 making an update request using an application; and 

70 causing one of a response to be sent to the application, an update of data, or both. 

71 15. The method of claim 13, further comprising a step of combining the 

72 update request with other data before updating the data targets. 

73 16. A method of providing input to an application expecting one or more 

74 tables as parameters to an input message, the method comprising: 

■ 75 mapping data in a NRDM data structure to function parameters; and 

76 making a function calls to the application using the NRDM mapped data structure. 
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Corresponding DTD Definition 



<?xml encoding ='UTF-8'?> 

<!ELEMENT Order {OrderNo, CustIO, ShipTol, ShipTo2, Li nolle ms-0> 

<! ELEMENT OrdorNo (3PCDATAW 

<! ELEMENT CustlO ($PCDATA)> 

<! ELEMENT ShipTol (#PCDATA)> 

<! ELEMENT ShipTo2 (#PCDATA>> 

<! ELEMENT Lineltems (item. ItemOiy. ltemPrice)> 

<i ELEMENT Item (3PCDATA> 

<! ELEMENT ItemQty (*PCDATA)> 

<!ELEMENT ItemPrice (#PCDATA)> 
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NESTED RELATIONAL DATA MODEL 

COPYRIGHT NOTICE 
A portion of the disclosure of this patent document contains material that 
is subject to copyright protection. The copyright owner has no objection to the facsimile 
5 reproduction by anyone of the patent document or the patent disclosure as it appears in 
the Patent and Trademark Office patent file or records, but otherwise reserves all 
copyright rights whatsoever. 

FTRLD OF TTTF INVENTION 
The present invention relates to information management in general and 
10 more particularly to methods for using Nested Relational Data Models (NRDMs) to 
manage information. 

BACK GROT IMP OF THE INVENTION 
Information is commonly managed in units of documents. For example, 
sales, distribution and manufacturing information might be contained within documents 
15 such as sales invoices or orders. Increasingly, documents pass between parties in 

electronic form, in a process generally referred to as EDI (Electronic Data Interchange). 
In electronic form, the documents are not limited to the text and images shown on the 
printed page, but can include formatting and "metadata" (data about the data). One 
example of a format for an electronic document that contains metadata is the Extended 

20 Markup Language (XML). 

Several products on the market allow mapping of XML documents to SQL 
tables or vice versa and several products on the market allow mapping of EDI documents 
to relational tables or vice versa, but these products typically require procedural 
specifications of how to perform the conversion, such as prograrrnning code. Traditional 

25 Relational Database Management Systems (RDMS's) such as described by Date or 
Ullman or implemented by Oracle, IBM, Microsoft and others as well as distributed 
databases as described in Ceri or U.S. Patent Nos. 5.8S4.3 10 and 5,596,744, implement 
declarative transformations of relational data. 

A class of systems called intelligent gateways (such as Sybase's 

30 OmniServer system) allow declarative rules to be transparently applied to heterogeneous 
relational databases. Another class of systems called Replication Servers (such as 
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described by U.S. Patent No. 5,737,601 or implemented as Sybase's Replication Server, 
Oracle's Replication Server, or the like) can provide homogeneous or heterogeneous data 
replication. 

Additional class of systems called the ETL (Extraction, Transformation, 

5 Loading) systems such. as Microsoft DTS, Inforrnatica PowerMart and D2K Tapestry 
provide extraction, transformation and loading of heterogeneous data between relational 
database systems. Some of these products support converting hierarchical files into a 
relational form by "flattening" the hierarchical files, making multiple passes through a 
hierarchical file and, at each pass, pulling out different parts of the hierarchy. 

jq Yet another class of systems that address mapping of relational data to a 

programming object, as exemplified by U.S. Patent Nos. 6,175,837, 6,163,781, 
6,134,559, 5,907,846, 5,873,093, 5,832,498, or products from Persistence, Bea and 
others. This class of tools maps persistently stored relational data to an object-oriented 
memory representation as well as mapping the data from an object-oriented memory 

15 representation to a set of persistent relational tables. 

Another class of prior art exists that provides object-oriented access to 
non-relational databases, as described in U.S. Patent Nos. 5,799,313, 5,778,379, and 
5,542,078. This class of systems addresses the mapping of data from hierarchical 
databases such as IMS, object oriented databases and relational databases to an 

20 object-oriented programming obj ect or database. 

Considerable research has been done on Nested Relational Data Models as 

described in , "Lecture Notes in Computer Science Volume 595: M. Levene - The 

Nested Universal Relation Database Model" and , "Lecture Notes in Computer 

Science Volume 36 1 : S. Abiteboul et al. - Nested Relations and Complex Objects in 

25 Databases". That research focused mainly on defining the data model and specific 
operations on it. 

It is known to graphically map disparate schemas to each other. See, for 
example, U.S. Patent Nos. 5,850,631 and 5,806,066. It is also known to map data 
between different structures. See for example, U.S. Patent Nos. 5,627,972 and 5,1 19,465. 

30 Si TTVfM ARY OF THE INVENTION 

In one embodiment of data processing system according to the present 
invention, hierarchical documents or hierarchical messages are mapped to a Nested 
Relational Data Model to allow for transformation and manipulation using declarative 



ISDOCID: <WO 0159602A1JA> 



WO 01/059602 



PCT/US01/04698 



statements. The resulting nested data can be converted to a relational format and mapped 
to multiple relational tables, and/or converted from a nested relational format to an 
external hierarchical format, such as XML. 

The system can specify and execute declarative rules to extract, transform, 
5 integrate, load and update hierarchical and relational data. The system can also be used 
for extending documents with relational and non-relational data and applying updates 
based on these documents to relational database targets. The system can also be used for 
mapping Nested Relational Data to function calls that accept tables as parameters and 
return multiple scalar and table parameters as output. 

10 BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 shows a table that is related to a single row of another table. 
Fig. 2 shows the data of Fig. 1, organized as multiple rows in a single 

table. 

Fig. 3 shows the data of Fig. 1, organized as multiple tables related by a 

15 join. 

Fig. 4 illustrates multiple levels of nested tables contained in one column. 

Fig. 5 illustrates a more general example of multiple levels of nested tables 
contained in more than one column. 

Fig. 6 is a block diagram of a database system according to one 
20 embodiment of the present invention. 

Fig. 7 illustrates schema relating to nested tables; Fig. 7 A shows input 
tables and Fig. 7B shows an output schema. 

Fig. 8 illustrates a process of grouping values across nested tables. 

Fig. 9 illustrates a process of unnesting data; Fig. 9 A shows how a table 
25 with a nested table would be unnested into a cross-product of the parent table and a child 
(nested) table; Fig. 9B illustrates unnesting into separate tables; Fig. 9C illustrates 
unnesting at multiple levels. . 

Fig. 10 illustrates a case where unnesting might produce unintended 

effects. 

30 Fig. 1 1 graphically illustrates an unnesting process and its effects on a 

query. 

Fig. 12 illustrates a process of converting a DTD to tables. 
Fig. 13 illustrates the XML encoding of a DTD definition. 
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Fig. 14 illustrates various real-time data flows. 
Fig. 15 illustrates an operation of joining two inputs in a query. 
Fig. 16 illustrates real-time data flows that use supplementary information. 
Fig. 17 illustrates data flows depending on cached values. 
Fig. 18 illustrates branching data flows based on rules. 
Fig. 19 is an illustration of a complex real-time data flow. 
Fig. 20 is an illustration of a GUI for specifying a data flow. 
Fig. 21 is a block diagram of a schema conversion system. 
Figs. 22-26 are tables illustrating various aspects of an NRDM system. 

n*Sf!RTPTIO N OF THE SPF f-TPTC EMBODIMENTS 
In a specific embodiment a Nested Relational Data Model (NRDM) is 
designed to support hierarchical and relational components used to represent business 
data. Business documents are typically hierarchical with multiple ^eating sets. For 
example, an order contains a set of repeating line items. It may also have a set of 
15 customers associated with it. 

Business documents used to exchange data between software systems 
within an enterprise or between enterprises need to be represented as complex 
hierarchical documents. The industry and the research community use well-known 
representations such as EDI and XML to capture and represent such documents. The 
20 system described herein provides methods for mapping such documents to aNested 
Relational format, methods for transforming and manipulating of these documents 
represented using the Nested Relational Data Model, converting such documents to 
relational format and mapping them to multiple relational tables, and a method of 
converting the data in a nested relational format back to an external hierarchical format 
25 • such as XML. 

The system provides a method to apply declarative rules to map the 
hierarchical (e.g., XML or EDI) data to relational tables and vice versa; declarative rules 
to enrich hierarchical data with data from other relational or hierarchical sources; 
declarative rules to perform multi-stage transformations. The system allows declarative 
transformations to be applied to hierarchical data, and the ability to transparently apply 
rules to heterogeneous databases and files; as well as in the ability to apply multi-stage 
transformations. Delcarative specifications (such as SQL) describe what to do with data, 
as opposed to procedural specifications (such as C++ code) that described how to do it. 
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"Nested data" is data in a table that is related to a single row of another 
table. Sales orders are often presented using nesting: the line items in a sales order are 
related to a single header. For a table of sales order headers, each row includes its own 
table of line items. An example of this is shown in Fig. 1. Of course, the same data could 
5 be represented without nested tables. For example, the data could be represented as 

multiple rows in a single table as shown in Fig. 2, or as multiple tables related by a join as 
shown in Fig. 3. 

One source of data for a nested table is the result of a query using the 
values in the related row in the parent table. As used herein, "parent table" refers to a 

10 table within which another table is nested and "child table" or "nested table" refers to a 
table that is nested in a column of a parent table. A nested table is said to have a 
relationship with the table within which it is nested and where levels are associated with 
tables, a parent table would have a level that is designated with a number one higher than 
the child tables nested in that parent table. For example, Fig. 4 shows a parent table 10, a 

1 5 nested (child) table 12 one level below table 1 0 and nested tables 14(a)-(b) that are nested . 
in table 12 and are two levels below table 10. 

Preferably, a unique instance of each nested table exists for each row at 
each level of a relationship. As illustrated in Fig. 5, each row at each level can have any 
number of columns containing nested tables. 

20 Fig. 6 shows various aspects of a database system 1 00 that handles NRDM 

data. System 100 is shown comprising a metadata mapper 104 that maps DTD 102 
w/hierarchical structures to NRDM schema that are stored in schema storage 106. These 
components are shown as being part of a preprocessing section, with other portions being 
part of a real-time section, but it should be understood that all of the process or none of 

25 the processing might be done in real-time without departing from the essence of the 
invention. Notwithstanding that caveat, the descriptions below reference an example 
wherein DTDs are converted to NRDM schema and stored and documents are converted 

by system 1 00 in real-time after such conversion. 

One such real-time process involved a document 1 10 being passed to an 

30 importer, then to a transformation engine (TE) 1 1 4 and an exporter 1 1 6 to result in a 
document in a new format 1 18 (in some cases, the formats of document 1 10 and 
document 118 might be the same, but some transformation has occurred). Document 1 10 
is a structured document, such as an XML document, an HTML page, a document having 
other structure, or other structured data object. 
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Importer 1 12 converts the document into NRDM data so that TE 1 14 can 
operate on data in the NRDM space, thus simplifying many transform operations, as 
described below. TE 1 14 accepts data in NRDM format as its input and outputs data in 
NRDM format. Of course, data in NRDM (Nested Relational Data Model) format need 

5 not have nested data (for example, if the input data can be structured such that nesting is 
not needed). Because TE 1 14 operates on NRDM structures, the transformations 
performed by TE 1 14 can be expressed simply as a declarative specification, thus greatly 
simplifying the process of transforming complex data. In effect, importer 1 12 .converts a 
hierarchical document into a relational database form to which declarative statements can 

10 be applied. 

Exporter 116 exports the data in a suitable form, such as XML documents, 
relational tables or flat files. 
Data Flows 

In a graphical interface used to build data flows and/or nested data 

1 5 structures, such as the ActaWorks™ system developed by Acta, Inc. structures of nested 
data in input and output schemas of sources, targets, and transforms in data flows are 
presented to a designer. An example of an input schema 60 is shown in Fig. 7A and an 
example of an output schema 62 is shown in Fig. 7B. Input schema 60 shows a table A 
that has columns columnl, column2 and a column for a nested table B, which in turn has 

20 columns column4 and column5. Input schema 60 also shows a table Z that has columns 
columnl 1, columnl 2 and a column for a nested table Y, which in turn has columns 
columnl4 and columnl 5. In Fig. 7A and others, nested tables appear with a table icon 
paired with a plus sign, which indicates that the object contains columns (a minus sign 
indicates that the object is open and if it has columns, those columns are visible. 

25 In a relational database system (RDS) using a declarative language such as 

SQL, a query transform might take the form of a SELECT statement that is executed by 
the RDS. When working widi nested data in an nested relational data model (NRDM) 
system according to some aspects of the present invention, the query can specify 
SELECTS at each level of a relationship denned in the output schema. Thus, while a 

30 SELECT statement might be constrained to include only references to relational data sets, 
a query that;includes nested data might include a SELECT statement to define operations 
on each table in the output-each context for the input data set is transformed. 

In such an NRDM system, the FROM clause descriptions and the behavior 
of the query are the same with nested data as with relational data, but the new interface of 
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contexts allows the data flow designer to distinguish multiple SELECTS from each other 
within a single query. At any context, the FROM clause can contain any top-level table 
from the input or any table that is a column of a table in the FROM clause of the next 
higher context. 

5 When rows of one table (a child table) are nested inside another table (a 

parent table), the data set produced in the nested table is the result of a query against the 
first table using, the related values from the second table. For example, if sales 
information is available as a header table and a line-item table, the sales information can 
be organized as a parent table of header information and a child table containing line-item 

10 data here the line-items are nested under the header table. The line items for a single row 
of the header table are equal to the results of a query including the order number, as might 
be found using the following statement: 

SELECT * FROM Lineltems 
15 WHERE Header . OrderNo = Lineltems .OrderNo 

Correlation can be used to construct a nested table from columns from a 
higher-level context. In a nested-relational model, the columns in a nested table are 
implicitly related to the columns in the parent row. To take advantage of this 

20 relationship, the parent table can be used in the construction of the nested table. The 
higher-level column is a correlated column. Including a correlated column in a nested 
table may serve at least two purposes: 1) the correlated column is a key in the parent table 
and 2) making the correlated column an attribute in the parent table. Including the key in 
the nested table allows for the maintenance of you a relationship between the two tables 

25 after converting them from the nested data model to a relational model. Including the 
attribute in the nested table allows for the use of the attribute to simplify correlated 
queries against the nested data. 

Correlated columns can include columns from the parent table and any 
other tables in the FROM clause of the parent table. If the correlated column comes from 

30 a table other than the immediate parent, the data in the nested table includes only the rows 
that match both the related values in the current row of the parent table and the value of 

the correlated column. 

Values can be grouped across nested tables. Thus, when a statement 

includes a Group By clause for a table with a nested table, the grouping operation 
3 5 combines the nested tables for each group. For example, to assemble all the line items 
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included in all the orders for each state from a set of orders, the designer would set the 
. Group By clause in the top-level of the data set to the state column (Order.State) and 

create an output table that includes State column (set to Order.State) and Lineltems nested 

table. The result of such an operation might result with the table shown in Fig. 8. The 
5 result is a set of rows (one for each state) that has the State column and the Lineltems. 

nested table that contains all the Lineltems for all the orders for that state. 

Nested data can also be unnested. When loading a data set that contains 

nested tables into a relational (non-nested) target, the nested rows will be unnested. Take, 

for example, a message containing a sales order that uses a nested table to define the 
10 relationship between the order header and the order line items. To load the data into • 

relational tables, the multi-level must be unnested. Unnesting a table produces a 

cross-product of the top-level table (parent) and the nested table (child), as shown in Fig. 

9A. Different, columns from different nesting levels might be loaded into different tables. 

A sales order, for example, may be flattened so that the order number is maintained 
15 separately with each line item and the header and line item information loaded into 

separate tables, as shown in Fig. 9B. 

Any number of nested tables can be unnested at any depth. No matter how 
many levels are involved, the result of unnesting tables is a cross product of the parent 
and child tables. When more than one level of unnesting occurs, the inner-most child is 
20 unnested first, then the result-the cross product of the parent and the inner-most child-is 
then unnested from its parent, and so on to the top-level table, creating the result shown in 
Fig. 9C. 

Unnesting all tables (cross product of all data) may not produce the results 
intended. For example, if multiple customer values are included in an order, such as 
25 ship-to and bill-to addresses, flattening a sales order by unnesting customer and line item 
tables produces rows of data that may not be useful for processing the order. This is 
illustrated in Fig. 10. Using the GUI, the specification of the data flow is shown in Fig. 
11. 

A DTD (document type definition) describes the data schema of an XML 
30 message or file. Real-time data flows read and write XML messages based on a specified 
DTD format. One DTD can describe multiple XML sources or targets. Batch data flows 
can read and write data to files based on a specified DTD format. 

DTDs can be imported into the NRDM system, either directly or by 
importing an XML document that contains a DTD. During import, the NRDM system 
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converts the structure defined in the DTD into an internal nested-relational data model. 
Elements below the root-level that contain other elements become nested tables and 
elements that do not contain other elements become columns. Attributes become . 
columns in the corresponding element's schema. 
5 The NRDM system applies the following rules to convert the DTD to 

tables, columns, and nested tables: 

- Any element that contains PCDATA only and no attributes becomes a column. 

- Any element with attributes or other elements (or in mixed format) becomes a table. 

- An attribute becomes a column in the table corresponding to the element it supports. 
10 - Any occurrence of choice operators is converted to strict ordering. 

- Any occurrence of optional operators is converted to strict ordering. 

- Any occurrence of 0* or () + becomes a table with an internally generated name-an 
implicit table. 

After these rules have been applied, the NRDM system optimizes the 
15 format using two more rules, except where doing so would allow more than one row at 
the root element: 

- If an implicit table contains one and only one nested table, then the implicit table can 
be eliminated and the nested table can be attached directly to the parent of the implicit 
table. For example, the SalesOrder element might be defined as follows in the DTD: 

< I ELEMENT SalesOrder (Header, Lmeltems*)> 

When converted, the Lineltems element with the zero or more operator would 
become an implicit table under the SalesOrder table. The Lineltems element itself 
25 would be a nested table under the implicit table, as shown in Fig. 12A. Because 

' the implicit table contains one and only one nested table, the format would be 
optimized to remove the implicit table, as shown in Fig. 12B. 

- If a nested table contains one and only one implicit table, then the implicit table can 
be eliminated and its columns placed directly under the nested table. For example, the 

30 nested table Lineltems might be defined as follows in the DTD: 

< ! ELEMENT Lineltems (ItemNum, Quantity) *> 

When converted, the grouping with the zero or more operator would 
35 become an implicit table under the Lineltems table. The ItemNum and Quantity elements 
would become columns under the implicit table, as shown in Fig. 12C. Because the 
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Lineltems nested table contained one and only one implicit table, it would be optimized 
to remove the implicit table, as shown in Fig. 12D. ' 

If the DTD contains an element that uses an ancestor element in its 
definition, the definition of the ancestor can be expanded for a fixed number of levels. 
5 For example, given the following definition of element "A" : 

A : B , C 
B: E, F 
F : A, .H 

10 The system produces a table for the element "F" that includes an expansion of "A." In this 
second expansion of "A," "F" appears again, and so on until the fixed number of levels. 
In the final expansion of ''A," the element "F" appears with only the element "H" in its 
definition. 
Real-Time Sources 

1 5 A real-time source in a real-time data flow determines the message that the 

real-time data flow will process. The source object represents the schema of the expected 
messages. Messages received are fit to the schema. Real-time data flows accept 
real-time source types such as Extensible Markup Language formatted (XML) messages 
or intermediate documents, such as IDocs published from an SAP R/3 application server. 

20 The format of the XML message is specified by a document type 

definition (DTD). The DTD describes the schema of data contained in the message and 
the relationships among the elements in the data. For a message that contains information 
to place a sales order-order header, customer, and line items-the corresponding DTD 
includes the order structure and the relationship between data, as shown by the example 

25 in Fig. 13. 

The following examples provide a high-level description of how real-time 
data flows address typical real-time scenarios. Fig. 14A shows a real-time data flow as 
might be used to load transactions into an ERP system, such as SAP R/3. A real-time 
data flow can receive a transaction from an electronic commerce application and load it to 
30 an ERP system. Using a query transform, one can include values from a data warehouse 
to supplement the transaction before applying it against the ERP system. 

Fig. 14B shows a real-time data flow for collecting ERP data into a 
warehouse. Real-time data flows can receive messages from the ERP through IDocs. 
Each IDoc contains a transaction that the real-time data flow can load into a data 
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warehouse or a data mart. In this way, IDocs can be used to keep the data in a warehouse 
current. 

Fig. 14C shows a real-time data flow for retrieving values from a cache or 
and ERP system. This allows for real-time data flows that use values from a data 

5 warehouse to determine whether or not to query the ERP system directly. 
Sup plementary Sources 

When more data is needed than what is provided in the content of a 
message to complete the message processing, supplementary sources might be used. For 
example, processing a message that contains a sales order from an electronic commerce 

10 application that contains the customer name might require that, when the order is applied 
against your ERP system, more detailed customer information is needed. Inside the 
real-time data flow, the message is supplemented with the customer information to 
produce the complete document to send to the ERP system. The supplementary 
information may come from the ERP system itself or from a cache containing the same 

15 information cached. Examples of such data flows are shown in Figs. 15, 16A, 16B. 

Tables and files (including XML files) as sources in real-time data flows 
can provide this supplementary information. The real-time data flow extracts data from 
the supplementary source as indicated by the logic defined in the real-time data flow. 

Tables or files that are used as sources and have a cache option allow for 

20 the data extracted to be stored in memory until the data flow processing is complete. In 
real-time data flows, sources should not be cached unless the data being cached is small 
and is unlikely to be updated in the life of the real-time data flow. 

In batch data flows, caching can improve the performance of data flow 
processing by reducing the number of times a set of data is read from the database or file 

25 source. In real-time data flows, however, the improvement in performance provided by 
caching is minimized by the likelihood that the real-time data flow reads only a small 
amount of data from the source for any given message. In addition, because the real-time 
data flow reloads cached data only when an access server shuts it down and restarts it, 
cached data may become stale in memory. 

30 Tables can be sources in real-time data flows after their metadata is 

imported into the repository. When the real-time data flow starts, it opens a connection to 
the source database. This connection remains open as long as the real-time data flow is 
running. If a table is included in a join with a real-time source, the data set from the 
real-time source is included as the outer loop of the join. 
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R/3 tables can be sources in real-time data flows after their metadata is 
imported into the repository. When the real-time data flow performs a query against the 
R/3 table, it executes an R/3 function call to extract the data through the SAP R/3 
application server. This method of extracting data from SAP R/3 is particularly well 
5 suited to extracting a small amount of specific data (on the order of 1 to 10 rows) in a 
real-time system, but might not work well as a substitute to using R/3 data flows to 
produce ABAP programs to extract large amounts of data in a batch system. 

Data from XML files can be used as sources in real-time data flows, if a 
DTD that describes the data in the file is imported. 

10 Sup plement ^ M^sage Data 

• ~ Th e data included in messages from real-time sources may not map 

exactly to requirements for processing or storing the information. If not, steps can be 
■ - defined in the real-time data flow to supplement the message information. One technique 

for supplementing the data in a real-time source includes these steps in a real-time data 
15 flow: 

1 . Include a table or file as a source. In addition to the real-time source, include the 
files or tables that supply the supplementary information. 

2. Use a query to extract needed data from the table or file. Use the data in the 
real-time source to find the needed supplementary data. A join expression can be 
used in the query so that only the specific values required from the supplementary 
source are extracted. 

Fig. 16A shows an example where a message includes sales order information with the 
ultimate goal to return order status. In this case, the business logic uses the customer 
number and priority rating to determine the level of status to return. The message 
includes only the customer name and the order number. The real-time data flow is then 
defined to retrieve the customer number and rating from other sources before determining 
the order status. 

A real-time data flow might include logic to determine when responses can 
be generated from data in a cache and when they, must be generated from data in an ERP 
30 system. One technique for constructing this logic includes, the steps in the real-time data 
flow (illustrated in Figs. 1 7-20): 
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1 . Determine the rule for when to access the cache and when to access the ERP 
system. 

2. Compare data from the real-time source with the rule. 

3. Define each path that could result from the outcome. Consider the case where the 
5 rule indicates ERP access, but the ERP system is not currently available. 

4. Merge the results from each path into a single data set. 

5 . Route the single result to the real-time target. 

This example describes a section of a real-time data flow that processes a new sales order. 
The section is responsible for checking the inventory available of the ordered products-it 
10 finds an answer to the question, "is there enough inventory on hand to fill this order?" The 
rule controlling access to the ERP system indicates that the inventory (Inv) must be more 
than a pre-determined value (IMargin) greater than the ordered quantity (Qty) to consider 
the cached inventory value acceptable. The comparison is made for each line item in the 
order. 

15 Fig. 18 illustrates a branch in the data flow based on a rule. An XML 

source contains the entire sales order, yet the data flow compares values for line items 
inside the sales order. The XML target that ultimately returns a response requires a single 
row at the top-most level. Because this data flow needs to be able to determine inventory 
values for multiple line items, the structure of the output requires the inventory 

20 information to be nested. The input is already nested under the sales order; the output can 
use the same convention. In addition, the output needs to include some way to indicate 
that the inventory is or is not available. 

Fig. 19 illustrates several ways to return values from the ERP. For 
example, a lookup function or a join on the specific table could be used in the ERP 

25 system. The example uses a join so that the processing can be performed by the ERP 
system rather than the NRDM system. As in the previous join, if a value might not be 
returned by the join, an outer join can be defined so that the line item row is not lost. 

Fig. 20 illustrates a GUI used to specify transformations and a specific 
transformation Specified with that GUI. 

30 Fig. 21 is a block diagram of a schema converter. In the example shown, 

an NRDM schema is converted to a DTD schema. 
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Other Uses 

One of the advantages of operating a transformation engine on NRDM 
data structures, as described above, is that the transformation engine can operate on 
hierarchical data as if it were a relational table. Thus, hierarchical documents, such as 

5 XML documents can be operated on using declarative statements, such as SQL, 

regardless of how many levels of hierarchy are present. One method of effecting such a 
benefit is to nest child tables into columns of parent tables and use a transformation 
engine that handles NRDM data as its input and as its output. The transformation engine 
can be sandwiched between an importer that converts hierarchical documents into NRDM 

10 data structures and an exporter that generates hierarchical documents from NRDM data 
structures. 

There are various ways to implement NRDM data structures. For 
example, conventional relational tables can be used, where a columnof the parent table 
stores a pointer to a child table. A separate child table could exist for each row of the 

15 parent table that does not have a NULL value for that row and column, or where the child 
tables for each row have corresponding formats, the data representing the child tables 
could be implemented as subtables of one child data-holding table. Regardless of the 
underlying structure, the transformation engine deals with the data structures as nested 
tables and applies declarative statements accordingly. 

20 Other aspects of the system described herein might find uses apart from 

NRDM data structures and systems. For example, requests received from applications for 
data processing and/or transformation might operate on nested tables, but might also 
operate on conventional relational tables. 

The applications often provide application programming interfaces (APIs) 

25 through with other programs interact with the application. Often, the designer of a . 

program that interacts with the application must know the interfaces and correctly specify 
the parameters of a particular function call. However, some applications might accept as 
an input NRDM data or a hierarchical document. In some cases, the application interface 
could be such that the semantics of the function call are in a document submitted as a 

30 parameter and then one generic interface is all that is needed to call the application. 
F.xample Tmplementation 

An example of an NRDM system according to various aspects of the 
present invention will now be described. It should be understood that the invention is not 
limited to this specific example. The example system supports hierarchical data models 
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such as IDoc and XML and provides for a hierarchical structure to support a hierarchical 
data model represented as a single row that contains scalar columns and repeating 
group(s) of embedded rows forming nested table(s), where nesting can be arbitrarily deep 
and an implicit relationship is not required between embedded rows and parent (i.e., the 
5 children rows do not need to contain a key to join it back to the parent row). 

The NRDM system can capture an entire business transaction in a single 
hierarchical structure and transform a hierarchical structure as a single entity using 
relation operators that can be applied at any level of the hierarchy. A hierarchical 
structure when applied as a single database transaction can be loaded to a set of tables 
10 belonging to a single datastore. 
Data Model 

In NRDM, the first normal form requirement that a column be a scalar is 
removed. In NRDM, a column can be a scalar or a relation value, which we refer to as a 
nested table. A scalar column definition has a name, type (including length, precision, 
1 5 domain info, etc.) and, at run time, contains either a value or a NULL indicator. A nested 
table definition has a name, schema (e.g., a list of column definitions) and, at run time, 
contains either one or more rows of the schema specified in the nested table definition or 
an empty table indicator (e.g., ISEMPTY). 
DDL Operations 

20 ALJNESTED_TABLE is used below to define a nested table for DDL 

operations. For example, creating a view with nested table might be done by the 
following statements: 

CREATE VIEW VI ( 

ORDER_ID INT, 
25 FROD_INFO AIi_NESTED_TABIiE ( 

PROD_ID INT, 
QTY INT, 

VENDOR__INFO AL_NE ST ED JT ABLE ( WDR_ID CHAR ( 5 ) , 

VNDR_CITY CHAR (65)) 

30 . >' 

CID INT/ 
CCITY CHAR (65) 
) ; 

Fig. 22 illustrates a data table that might result for the above statements. 
35 T)ML O perations 

Relational operations such as select, project, etc. can be used on NRDM 
data. Nested relations can be accessed as regular relations in the context (scope) of their 
parents. In other words, wherever a scalar column is used, a nested table can be used. If 
a parent table is used in a FROM clause, all the nested tables can be used in the SELECT 
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and WHERE clauses and nested subqueries as full-fledged tables. If two parent tables 
having a same name for a nested table are used in a relational operation, the nested tables 
should be qualified with the parent tables. 

Nested subqueries allow for accessing and transforming data inside nested 
relations. Nested subqueries can transform data in nested relations, nest, unnest and join 
data in nested relations with the data in its parents and handle operations such as 
ISEMPTY, AL.NEST, AL_NEST_SET and ALUNNEST for NRDM data. The 
AL_NEST operator creates partitions based on the formation of equivalence classes to 
generate nested tables. It operates on a row basis. AL_NEST_SET operator is similar to 
AL_NEST but operates on a set basis. The AL_UNNEST operator transforms a relation 
into one, which is less deeply nested by concatenating each tuple in the relation being 
unnested to the remaining attributes in the relation. 

The AL_NEST operator creates partitions based on the formation of 
equivalence classes to generate nested tables. Two tuples are equivalent if they have the 
same values for attributes, which are not being nested. AL.NEST operates on a row 
basis. Nesting can be done in two ways using a user interface (such as the GUI described 
above). A nested table can be dragged from the input to the output of a query transform 
and placed at the same or deeper level, or a nested schema can be created and columns 
from the input can be dragged and dropped into the newly created schema. 

An explicit FROM clause might be needed where two views are coming 
into a query transform, and columns are selected from only one the views. The generated 
language is to select from both the views. For nesting of two input views containing only 
scalar columns, selecting from the both the views at the same level might not be desired. 
The following example illustrates this. Given a flat view VI as: 

CREATE VIEW ORDERS (ORDER_ID INT, PROD_ID INT, QTY INT, 
CID INT, CCITY VARCHAR(65)) 

CREATE VIEW VENDORS (PROD ID INT, VNDR_ID VARCHAR ( 5 ) , 

VNDR_CITY VARCHAR ( 65 ) ) 

the table of flat relations shown in Fig. 23 results. A two level nesting to include vendor 
information using a JOIN can be demonstrated by the following example: 

CREATE VIEW V2 (ORDER ID INT, 

CREATE VA* pRoc .-^ ^ mgTgD TABLE ( p R0 D_ID INT, 

- - ~ QTY INT, 

VENDOR_INFO 
AL_NESTED_TABIjE ( 

VNDR_ID CHAR (5), 
VNDR_CITY CHAR (65) 
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) 

) , 

CID, 
CCITY 

5 ) 

AS SELECT ORDER_ID, 

AL_NEST (CREATE VIEW PROD_INFO (PROD_ID INT, QTY INT) 
AS SELECT PROD_ID, 
QTY, 

lQ AliJKEST (CREATE VIEW VENDOR_INFO 

(VNDR_ID CHAR (5) , 
VNDR_CITY CHAR (65)) AS 
SELECT VNDR_ID, VNDR_CITY 
FROM VENDORS 

15 WHERE VENDORS . PROD_ID hi . PROD ID 

) 

AS VEND OR__ INFO 
FROM ORDERS hi 

WHERE hi . ORDER_ID = LO . ORDER_ID AND 
20 I/I. CID - LO.CID AND 

LI. CCITY hO . CCITY 

) 

AS PROD_INFO, 
CID, 

25 CCITY 

FROM ORDERS hO 

The explicit FROM clause prevents the usage of the VENDORS in the 
outermost select. This may produce a nested table as shown in Fig. 22, except with three 
30 rows with ORDER JDD equal to 100, two rows with ORDER JD equal to 200 and one 
row with ORDERJD = 300, because ALJMEST operates on a row basis, which can 

produce duplicates. 

The AL_NEST operator may be used to perform nesting on a set of rows 
also. If there is a GROUP BY, the set formed by the GROUP BY is used. If there are 
35 aggregate functions and a GROUP BY is specified, the set formed by the GROUP BY is 
used. If there are aggregate functions and a GROUP BY is not specified, then the default 
grouping is the entire table. All nested tables in the set operated by the AL_NEST may 
be merged. 

Using AL NEST SET with an Aggregate Function 
40 This operation may take in a view with nested tables and produce a single 

row, which has count of ORDER_JD's and the merge of all nested tables: 

CREATE VIEW V2 (NUMJ3RDERS INT, 

PROD_INFO AL_NESTEDJTABLE (PROD_ID INT, 

QTY INT 

45 ) 

) 

AS SELECT COUNT <ORDER_ID) , 

At._NEST_SET (CREATE VIEW PROD_INFO (PROD_ID INT, 
~" QTY INT) AS 

17 
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SELECT PROD_±r>, QTY 
FROM PROdJEnFO 

) 

AS PROD_INFO, 

5 

FROM VI 

Such a query might produce the table shown in Fig. 24. If the nested table(s) SELECT(S) 
have WHERE clauses, the nested table(s) might first be merged and the filters applied to 
the merged table(s). , 
10 AL UNNEST 

The ALJJNNEST operator transforms a relation into one that is less 
deeply nested by concatenating each tuple in the relation being unnested to the remaining 
attributes in the relation. To unnest the vendor information from the nested table in Fig. 
22, the following ATL might be defined: 

15 CREATE VIEW V2 (ORDER_ID INT, 

PROD_INFO AL_NESTED_TABLE (PROD_lD INT, 

QTY INT, 

VKDR_ID CHAR (5))) 

AS SELECT ORDER_ID, 

20 AL_NEST (CREATE VIEW PROD__INFO (PROD_ID INT, QTY INT) AS 

SELECT Vl.PROD_INFO.PROD_ID, 
VI . PROD_INFO . QTY , 

ALJJNNEST (CREATE VIEW VDR_INFO 
(VNDR_ID INT) AS 

25 SELECT 

VI . PROD_INFO . VEND OR__ INFO . VNDR_ID 
FROM VI . PROD_INFO . VENDOR_INFO) 

FROM Vl.PROD_INFO) 
AS PROD_INFO 

30 FROM VI 

WHERE clauses can be applied in the SELECT for unnesting by drilling 
into the nested table which would produce a query transform, specifying the condition 
there, as shown in the following example: . 

CREATE VIEW V2 (VNDR_ID CHAR (5), VNDRJCITY CHAR (65)) 
35 AS SELECT DISTINCT AL_UNNEST (CREATE VIEW 

UNEST1 ( VNDR_ID CHAR ( 5 ) , 
VNDR_CITY CHAR ( 6 5 ) ) 
AS SELECT 

AL_UNNEST (CREATE VIEW 
40 UNF:ST2 (VNDR_ID CHAR (5) , 

VNDR_CITY 
CHAR (65) ) 
AS SELECT VNDR_ID, VNDRJTITY 
FROM VEND OR_ INFO ) 
45 * FROM PROD_INFO 



) 



FROM VI 



Project 

An example of a simple projection from one hierarchical . structure to 
50 another would be: 
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CREATE VIEW V2 ( 

ORDER_ID INT, 

PROD_INFO AL_NESTED_TABLE(PROD_ID INT, QTY INT) 

) 

5 AS SELECT ORDER_ID, 

ALJNBST (CREATE VIEW PROD_lNFO (PROD_ID INT, QTY INT) 
AS SELECT VI . PROD_INFO . PROD_ID , VI . PROD_INFO . QTY 
FROM Vl.PROD_INFO) 
AS PROD_INFO 

10 FROM VI 

The qualifier VI .PROD^INFO in the nested relation is not really needed; the nested 
query could have been written using just PROD_INFO. The result might be the table 
shown in Fig. 25. 
Select 

j 5 Filter conditions can be applied at various levels. Consider the example of 

view VI (Fig. 22) that has three levels of nesting. A filter on the nested relation 
PRODJNFO might be implemented as follows: 

CREATE VIEW V3 {ORDER_ID INT, 

PROD_INFO AL_NESTED_TABLE (PROD ID INT, QTY INT) 

20 > 
AS SELECT 

ORDER__ID, 

A£_NEST (CREATE VIEW PROD_INFO ( PROD_ID INT, QTY INT) 

AS SELECT VI . PROD_INFO . PROD_ID , 
25 VI . PROD_INFO . QTY 

FROM Vl.PROD_INFO 
WHERE VI . PROD_INFO . QTY > 50) 

AS PROD_INFO 

FROM VI 

This may select all the rows from VI, but for the nested table PRODJMFO, only those 
rows are chosen where the quantity ordered QTY is greater than 50, resulting in the table 
shown in Fig. 26. 

Alternate Support For Filters In The WHERE Clause 

35 For a nested table to be used in a WHERE clause sub-query, support 

within a WHERE clause should be available. If such support is not available, it can be 
overcome by using two stages and the ISEMPTY operator for nested tables. Nested 
tables can be used in a WHERE clause only with the ISEMPTY operator. The following 
example illustrates the use, selecting all the rows from VI that have ORDERJOD greater 

40 than 100 and that have at least one product with a quantity ordered greater than 50. 

CREATE VIEW V3 (ORDER_ID INT, 

PROD_INFO AL_NE ST EDITABLE (PROD ID INT , QTY INT) , 

TEMP_PROD__INFO AL_NESTED_TABI,E ( PROD_ID INT, QTY 

INT) 

45 ) 

AS SELECT. 

ORDER_ID, 

19 
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AL_NEST (CREATE VIEW PROD_INFO (PROD_ID INT, QTY INT) 
AS SELECT V1.PROD_INFO.PROD_ID, 

• VI . PROD_INFO . QTY 
FROM VI . PROD_INFO 

5 ) 

AS PROD INFO , ^ r . 

ALJTEStTcREATE VIEW PROD^INFO (PROD^ID INT, QTY INT) 
AS SELECT VI . PROD_INFO . PROD_ID , 
VI . PRODJCNFO . QTY 
10 FROM Vl.PROD_INFO 

WHERE VI . PRODJLNFO . QTY > 50) 
AS TEMP PROD_INFO 



15 



FROM VI WHERE VI.ORDERJID > 100 
CREATE VIEW V4 (ORDER ID INT, 

CREATE v±*w \ ROD J NFQ AL _ NES TED_TABLE(PROD_ID INT , QTY INT) 



) 

AS SELECT 
20 ORDER_ID, 



AL_NEST (CREATE VIEW PROD^INFO (PROD_ID INT, QTY INT) 
AS SELECT VI . PROD_INFO . PROD_ID , 
VI . PROD_INFO . QTY 
FROM Vl.PROD_INFO 



25 > 

AS PROD INFO 



Join 



FROM V3 WHERE I IS EMPTY (TEMP_PROD_INFO) 

30 Nested relations can be joined with any other relations. An example is 

given below: 

rRFATE VIEW ORDERS (ORDERID INT, PRODUCTS 

CREATE VIE* °^^ STED _ TABLE ( PRO DID INT, PRODNAME VARCHAR (10))) 

^5 • CREATE VIEW VENDORS (PRODID INT, VENDORID INT, 

^ CK^Aiii vxnn VENDORNAME VARCKAR (10)); 

CREATE VIEW ORDERS WITH_VENDORS {ORDERID INT, , nonnTn 
CR*A1* vj.* _ PRODUCTS AIi NESTED_TABLE (PRODID INT, 

PRODNAME VARCHAR (10) 
4 0 VENDORID INT) 

AS 

SELECT ORDERID, 

AL NEST (CREATE VIEW PRODUCTS (PRODID INT, . 

- PRODNAME VARCHAR (10) , 

4 -> VENDORID INT) 

AS SELECT PRODID, PRODNAME, VENDORID 
FROM PRODUCTS, VENDORS 

WHERE PRODUCTS . PRODID «= VENDORS . PRODID) 

50 AS PRODUCTS 

FROM ORDERS GROUP BY ORDERID 
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Nested Table Transform ' 

A system transform is available that takes in a flat view and produces a 
singleton that has a N integer scalar column with a value 1 , and a nested table containing 
the input view. 
5 Tables as Parameters 

Tables can be used as parameters for imported functions. Given a function 
get_orders with an input parameter customer Jd and an output parameter orders: 

CREATE FUNCTION get_orders (cust_id int, 

orders AL_ NESTED_TABIiE (order_id int, ...) 

iq OUTPUT, 

cust_info AL_NESTED__TABLE (cust_name, . .) 

OUTPUT) ; 

Get orders for each customer by calling the orders function: 

15 CREATE VIEW customs reorders (customer_id int, 

orders AIi_NESTED_TABLE (order_id 

int, -.)) 

AS SELECT customer^id, 

AL_NEST (get_orders (cus tomer_id) : : orders) 
20 AS orders 

FROM customers; u 

if the function has multiple tables as outputs, and all or some of them are required, then 
the function has to be invoked multiple times: once for each output. 

CREATE VIEW cus tomer_orders ( cus tomer_id int, 
25 ' cust_info AI»_NE STED__T ABLE (cus t_name , . .) , 

orders AL_NESTED_TABIiE (order_id 
int, ...)) 

AS SELECT customer_id, § 

Ali NEST (get_orders (customer_id) : :cust_inro) AS 

30 cust info « — ■ j 

AL_NEST (get_orders (customer_id) :: orders) AS orders 

FROM customers ; 

As an optimization, the system could invoke the function only once and use those results 
for different instances within the query transform. For mapping a function returning 

35 table, a user would create a nested table column and map the nested table column to the 
function returning a table. The schema of the nested table may then be identical to the 
schema returned by the function. This is a concept of a "generated table". The schema 
definition of generated table cannot be modified, and it should disappear when the. 
function is removed from the mapping. It should be represented differently in the UI so 

40 that a user can distinguish between a generated table and a non-generated table. 
Hierarchical File Reader 

A hierarchical file reader reads data generated by data flows that have 
functions that return tables. There are two main alternatives: model the file reader as an 
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XML file reader or model the file reader using a proprietary format to represent 
hierarchical data. 

Pffer.tnfNRDM or Systgm Transforms 

Sys tem transforms such as Table_Comparison, Hierarchy JFlatternng, etc. 

accept only rows with scalar columns. 

T^ble_ComEarison: The output schema of the table comparison transform 

is a generated schema and is same as the schema of the table being compared against. 
This transform may silently ignore columns that are nested tables. 

gj^ tnry Preserving : The output schema of the history preserving transform 
is same as the input schema, and this transform may preserve history only scalar columns 
and may act as pass through for columns that are nested tables. 

Fffer.tivePate : The transform may act as pass through for columns that are 

nested tables. 

^Generation : The output schema of the key generation transform is 
same as the input schema, and this transform may act as pass through for columns that are 
nested tables. 

Map O peration : The output schema of the map operation transform is 
same as the input schema, and this transform may not allow operations to be mapped for 
columns as nested tables and may act as pass through for them. 

VKmrrhv Flattening: Columns as nested tables cannot be a parent or child 
column of a hierarchy, but they can be dragged and dropped attribute columns and thus 

can appear in the output schema. 

Pivot ; The output schema of the hierarchy flattening transform is a 

generated schema and columns, as nested tables may be ignored. 

25 A rase Study 

A case study of a Sales Order IDoc using NRDM was performed. The 
IDoc was captured in a NRDM and perform transformations, to arrive at the same result 
as if the NRDM was not used, but with simplified specification of the transformations. 

An TDoc is divided into a control record, data records and a status record. 
Each control record and status record has numerous fields. For our purpose of validating 
the NRDM, we treated control records and status records as single varchar columns. The 
ATL to represent a Sales Order (some of the columns associated with nested tables might 
be omitted in the listing) is: 
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CREATE VIEW VI ( 

CONTROL_RECORD VARCHAR (100) , 
STATU S_RECORD VARCHAR (100) , 
E2CMCCO AL_NESTED_TABLE ( : 
ZEITP VARCHAR (2), . ■ , 
E2CVBUK AI>_NESTEDJTABLE ( 

SUPKZ VARCHAR (l) , 
E2CVBAK AL_NESTED_TABLE ( 

SUPKZ VARCHAR (1) / 
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35 



40 



45 



50 



55 



60 



AL NESTEDJTABIjE ( 



AL_NESTEDJTABLE ( 



AI*_NESTED_TABIiE ( 



Al*_NESTED_TABIiE ( 



AIi_NESTED_TABI,E ( 



AL_NESTED_TABLE ( 



AIj_NBSTED_TABIiE ( 



AL NESTEDJTABLE ( 



E2CVBK0. AL_NESTED__TABLE ( 

SUPKZ VARCHAR CD , 

E2CVBP0 AL_NESTED_TABLE ( 

SUPKZ VARCHAR 

(1), 

) , 

E2CVBAP AL_NESTED_TABIiE ( 

SUPKZ VARCHAR (1) , 
E2CVBA2 



VARCHAR (1) , 
) , 



SUPKZ 



AL_NESTEDJTABLE ( 



E2CVBUP 

SUPKZ 
VARCHAR (1) , 
) , 

E2CVBPF 

SUPKZ 
VARCHAR (1) 
) r 

E2CV3KD 

SUPKZ 
VARCHAR (1) , 
>, 

E2CKONV 

SUPKZ 
VARCHAR (1) , 
) , 

E2CVBPA 

SUPKZ 
VARCHAR (1) , 
) i 

E2CVBFA 

SUPKZ 
VARCHAR (1) , 
) , 

E2CFPLT 

SUPKZ 
VARCHAR (1)., 
) , 

E2CVBEP 



23 



~.DOC ID: <WO 0lS9GO2A1_IA> 



WO 01/059602 



PCT/US01/04698 



SUPKZ 
VARCHAR (1) , 
) , 

) , # E2CVBAP 
5 ) , # B2CVBAK 

) , # E2CVBUK 

) # E2CMCC0 

) # VI 
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The ATL corresponding to the population of the sales order fact table from 
the above view may be (with some columns omitted for illustration purposes): 



CREATE VIEW V2 ( SO TO, # VBAK. VBELN 

CREATE VIA* soLDJTO, # VBAK. KUNNR 

1<; LINE_ITEM_ID, # VBAP.POSNR 

13 CREATE^JD ATE , # VBAP.KRDAT 

SHIPJTO, # VBPA . KUNNR 
DELIVERY_STATUS # VBUP . LFGSA 

j 

20 AS SELECT AL_UNNEST 

(SELECT AL_UNNEST 

(SELECT AL_TTNNEST 

(SELECT VBELN", KUNNR, 

AL UNNEST (SELECT POSNR, ERDAT, 

" AL_UNNEST (SELECT KUNNR FROM 

E2CVBPA WHERE PARVW = 

'WE') , AL_UNNEST (SELECT LFGSA FROM 

30 E2CVBUP) FROM E2CVBAP 

) 

FROM V1.E2CMCC0.E2CVBUK.E2CVBAK 
) 

^ s FROM VI . E2CMCC0 . E2CVBUK 

* ) 

FROM VI . E2CMCC0 

) 

FROM VI 

40 
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WHAT IS CLAIMED IS: 

1 L An apparatus for processing data representable in a hierarchical form, 

2 the apparatus comprising: 

3 an importer having inputs to receive a schema and a structured document from a data 

4 source, wherein the importer outputs a first nested relational data model 

5 (NRDM) data structure representing the structured document according to the 

6 received schema; 

7 an transformation engine that is capable of transforming the first NRDM data 

8 structure output by the importer into a second NRDM data structure according to 

9 a declarative specification of a transform; and 

10 an exporter having an input to receive the second NRDM data structure, wherein the 

1 1 exporter outputs a transformed hierarchical document in a data structure other 

12 than an NRDM data structure in a form suitable for a data target. 

1 3 2. The apparatus of claim 1, further comprising means for converting 

14 relational data to an NRDM data structure by vertically partitioning a relation and nesting 

15 parts of the relational data as a nested table. 

16 3. The apparatus of claim 1, further comprising means for converting 

17 nested relational data to relational data by unnesting the nested tables using a 

1 8 cross-product between a parent row and a child subtable. 

2 g 4. The apparatus of claim 1, further comprising means for performing a 

20 grouping operation on a nested table that generates a resulting nested table containing a 

2 1 union of all the nested tables grouped by the operation. 

22 5. The apparatus of claim 1, further comprising means for performing 

23 multi-step transformations, wherein an input to a transformation is results of a previous 

24 transformation, a data source, or both. 

25 6. The apparatus of claim 1, wherein the transformation engine operates on 

26 rules that are applied to data independent of data format. 

27 7. The apparatus of claim 1, wherein the exported is adapted to output one 

28 or more of an XML file, a relational table or a flat file. 

25 
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29 8. A metadata mapper comprising: 

30 an input for receiving a document description for hierarchical documents; and 

3 1 an output for outputting an NRDM data structure representing the document 

4 

32 description. 

33 9 . An apparatus for transforming data representable in a hierarchical form, 

34 the apparatus comprising: 

35 an importer having inputs to receive a schema and a structured document from a data 
• 36 source, from a data transformer, or from both, wherein the importer outputs a 

37 first nested relational data model (NRDM) data structure representing the 

3 g structured document according to the received schema; 

39 an transformation engine that is capable of transforming the first NRDM data 

40 structure output by the importer into a second NRDM data structure according to 

41 a declarative specification of a transform; and 

42 an exporter having an input to receive the second NRDM data structure, wherein the 

43 exporter outputs a transformed hierarchical document in a data structure other 

44 than an NRDM data structure in a form suitable for a data target. 

45 1 0 ; A method for providing data to an application through a data platform 
~ 46 in a computer system in response to request from the application, the method comprising: 

47 accepting declarative rules for accessing the data from data sources and declarative 

48 rules for transforming the data into a format requested by the application; 

49 mapping relational and non-relational data sources to an NRDM data structure; 

50 interpreting a request; 

5 1 retrieving data from the data sources; 

52 transforming the data according to the declarative rules; and 

53 returning the transformed data to the application. 

54 • 11. The method of claim 10, wherein requests are processed as messages 

55 and request messages contain sufficient information to drive data extraction into a 

56 data-oriented interface. 

57 12. The method of claim 10, wherein the requests are application 

58 programming interface function calls. 
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59 13. A method for updating a plurality of data targets from a message, 

60 comprising: 

61 making an update request through a data-oriented interface; 

62 specifying declarative rules for updating the data targets; 

63 importing metadata that maps relational and non-relational data targets to NRDM 

64 data structures; 

65 interpreting incoming update requests; 

66 transforming the data according to the declarative rules; and 

67 updating the data targets. 

68 14. The method of claim 13, further comprising: 

69 making an update request using an application; and 

70 causing one of a response to be sent to the application, an update of data, or both. 

71 15, The method of claim 13, further comprising a step of combining the 

72 update request with other data before updating the data targets. 

73 i6. A method of providing input to an application expecting one or more 

74 tables as parameters to an input message, the method comprising: 

75 mapping data in a NRDM data structure to function parameters; and 

76 making a function calls to the application using the NRDM mapped data structure. 
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